pavan ramkumar - modeling peripheral visual acuity enables discovery...

20
Modeling peripheral visual acuity enables discovery of gaze strategies at multiple time scales during natural scene search Pavan Ramkumar $ Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, IL, USA Department of Neurobiology, Northwestern University, Evanston, IL, USA Hugo Fernandes $ Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, IL, USA Instituto Gulbenkian de Ci ˆ encia, Oeiras, Portugal Konrad Kording $ Department of Physical Medicine and Rehabilitation, Northwestern University and Rehabilitation Institute of Chicago, Chicago, IL, USA Mark Segraves $ Department of Neurobiology, Northwestern University, Evanston, IL, USA Like humans, monkeys make saccades nearly three times a second. To understand the factors guiding this frequent decision, computational models of vision attempt to predict fixation locations using bottom-up visual features and top-down goals. How do the relative influences of these factors evolve over multiple time scales? Here we analyzed visual features at fixations using a retinal transform that provides realistic visual acuity by suitably degrading visual information in the periphery. In a task in which monkeys searched for a Gabor target in natural scenes, we characterized the relative importance of bottom-up and task-relevant influences by decoding fixated from nonfixated image patches based on visual features. At fast time scales, we found that search strategies can vary over the course of a single trial, with locations of higher saliency, target-similarity, edge– energy, and orientedness looked at later on in the trial. At slow time scales, we found that search strategies can be refined over several weeks of practice, and the influence of target orientation was significant only in the latter of two search tasks. Critically, these results were not observed without applying the retinal transform. Our results suggest that saccade-guidance strategies become apparent only when models take into account degraded visual representation in the periphery. Introduction During naturalistic visual search or exploration, in order to bring objects of interest to the fovea, we move our eyes about three times a second. What factors influence where we look? The dominant working hypothesis for computational models of gaze suggests that the visual system computes a priority map of the visual field before every saccade, and a saccade is subsequently made to a target of high priority (Bisley & Goldberg, 2010; Fecteau & Munoz, 2006; Serences & Yantis, 2006; Treisman & Gelade, 1980; Wolfe, 1994). What factors constitute this priority map? Studies have shown that many factors influence the guidance of eye movements. These factors can be generally grouped into task-independent, or bottom-up features, and task- relevant features. Examples of bottom-up features are luminance contrast (Reinagel & Zador, 1999; but see Einh ¨ auser & K ¨ onig, 2003), color contrast (Itti, Koch, & Niebur, 1998), energy (Ganguli, Freeman, Rajashekar, & Simoncelli, 2010), and saliency (Itti & Koch, 2001; Itti et al., 1998); examples of task-specific features are relevance (resemblance to search target; Einh ¨ auser, Rutishauser, & Koch, 2008); learned context about Citation: Ramkumar, P., Fernandes, H., Kording, K., & Segraves, M. (2015). Modeling peripheral visual acuity enables discovery of gaze strategies at multiple time scales during natural scene search. Journal of Vision, 15(3):19, 1–20. http://www.journalofvision. org/content/15/3/19, doi: 10.1167/15.3.19. Journal of Vision (2015) 15(3):19, 1–20 1 http://www.journalofvision.org/content/15/3/19 doi: 10.1167/15.3.19 ISSN 1534-7362 Ó 2015 ARVO Received June 4, 2014; published March 26, 2015 Downloaded From: http://jov.arvojournals.org/pdfaccess.ashx?url=/data/Journals/JOV/933691/ on 11/02/2015

Upload: others

Post on 29-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

Modeling peripheral visual acuity enables discovery of gazestrategies at multiple time scales during natural scene search

Pavan Ramkumar $

Department of Physical Medicine and RehabilitationNorthwestern University and Rehabilitation Institute of

Chicago Chicago IL USADepartment of Neurobiology Northwestern University

Evanston IL USA

Hugo Fernandes $

Department of Physical Medicine and RehabilitationNorthwestern University and Rehabilitation Institute of

Chicago Chicago IL USAInstituto Gulbenkian de Ciencia Oeiras Portugal

Konrad Kording $

Department of Physical Medicine and RehabilitationNorthwestern University and Rehabilitation Institute of

Chicago Chicago IL USA

Mark Segraves $Department of Neurobiology Northwestern University

Evanston IL USA

Like humans monkeys make saccades nearly three timesa second To understand the factors guiding this frequentdecision computational models of vision attempt topredict fixation locations using bottom-up visual featuresand top-down goals How do the relative influences ofthese factors evolve over multiple time scales Here weanalyzed visual features at fixations using a retinaltransform that provides realistic visual acuity by suitablydegrading visual information in the periphery In a taskin which monkeys searched for a Gabor target in naturalscenes we characterized the relative importance ofbottom-up and task-relevant influences by decodingfixated from nonfixated image patches based on visualfeatures At fast time scales we found that searchstrategies can vary over the course of a single trial withlocations of higher saliency target-similarity edgendashenergy and orientedness looked at later on in the trialAt slow time scales we found that search strategies canbe refined over several weeks of practice and theinfluence of target orientation was significant only in thelatter of two search tasks Critically these results werenot observed without applying the retinal transform Ourresults suggest that saccade-guidance strategies becomeapparent only when models take into account degradedvisual representation in the periphery

Introduction

During naturalistic visual search or exploration inorder to bring objects of interest to the fovea we moveour eyes about three times a second What factorsinfluence where we look The dominant workinghypothesis for computational models of gaze suggeststhat the visual system computes a priority map of thevisual field before every saccade and a saccade issubsequently made to a target of high priority (Bisley ampGoldberg 2010 Fecteau amp Munoz 2006 Serences ampYantis 2006 Treisman amp Gelade 1980 Wolfe 1994)

What factors constitute this priority map Studieshave shown that many factors influence the guidance ofeye movements These factors can be generally groupedinto task-independent or bottom-up features and task-relevant features Examples of bottom-up features areluminance contrast (Reinagel amp Zador 1999 but seeEinhauser amp Konig 2003) color contrast (Itti Koch ampNiebur 1998) energy (Ganguli Freeman Rajashekaramp Simoncelli 2010) and saliency (Itti amp Koch 2001Itti et al 1998) examples of task-specific features arerelevance (resemblance to search target EinhauserRutishauser amp Koch 2008) learned context about

Citation Ramkumar P Fernandes H Kording K amp Segraves M (2015) Modeling peripheral visual acuity enables discovery ofgaze strategies at multiple time scales during natural scene search Journal of Vision 15(3)19 1ndash20 httpwwwjournalofvisionorgcontent15319 doi 10116715319

Journal of Vision (2015) 15(3)19 1ndash20 1httpwwwjournalofvisionorgcontent15319

doi 10 1167 15 3 19 ISSN 1534-7362 2015 ARVOReceived June 4 2014 published March 26 2015

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

target location (Ehinger Hidalgo-Sotelo amp Torralba2009) factors that are ecologically relevant but notspecific to the task such as intrinsic value (Gottlieb2012) and exploratory strategies for minimizinguncertainty (Gottlieb Oudeyer Lopes amp Baranes2013 Renninger Verghese amp Coughlan 2007) Inaddition change in direction between successivesaccades (Wilming Harst Schmidt amp Konig 2013)and natural statistics of saccade magnitude anddirection (Tatler amp Vincent 2009) also enable saccadeprediction Predictive models of eye movements arederived from priority maps comprising one or more ofthe above factors

Although systematic comparisons of factors influ-encing saccade guidance have been made in humans(eg Hwang Higgins amp Pomplun 2009 Navalpak-kam amp Itti 2006) similar studies are few and farbetween for nonhuman primates (NHPs BergBoehnke Marino Munoz amp Itti 2009 FernandesStevenson Phillips Segraves amp Kording 2013 Kanoamp Tomonaga 2009) Since NHPs are important modelorganisms for understanding complex tasks performedby the visual system it is important to rigorously modeltheir gaze behavior in naturalistic conditions Howeveralthough a multitude of factors constituting prioritymaps have been proposed in different contexts howthis notion of priority in the visual field evolves overtime has not been addressed Therefore our goal in thisstudy is to understand how search strategy in monkeysevolves over multiple time scales in naturalisticconditions ranging from a few seconds of exploring asingle image to learning to perform similar search tasksover several weeks

To this end we designed a natural search task inwhich two macaque monkeys searched for either avertical or a horizontal Gabor target placed randomlywithin human-photographed scenes To analyze gazebehavior from this task we adopted two criticalmethodological innovations First although it is wellknown that visual features in the periphery are notsensed with the same acuity as those at the fovea veryfew models of gaze have taken this into account(Zelinsky 2008) Therefore before computing visualfeatures at fixation we applied a simple computationalretinal transform centered on the previous fixation thatdegrades peripheral visual information in a retinallyrealistic manner (Geisler amp Perry 1998) Second wenote that the features that comprise priority maps areoften correlated for instance edges in natural scenestend to have both high luminance contrast and highenergy and the apparent influence of luminancecontrast might potentially be explained away by theinfluence of energy or vice versa Therefore to tellapart the relative influence of correlated features weapplied multivariate decoding analysis of fixated versus

nonfixated patches to quantify the influence of variousvisual features on gaze

Over the course of viewing a single image we foundthat monkeys fixated locations of greater saliencytarget similarity (relevance) edge-energy orientednessand verticalness later on in the trial Monkeys usedshort saccades to select locations scoring high in allfeatures with the exception of orientedness which wasselected for using long saccades We also found asignificant practice effect over several weeks withtarget orientation having a significant effect only in thelatter of two tasks performed Thus realistic modelingof peripheral vision and multivariate decoding allowedus to tease apart the relative influence of variousfeatures contributing to the priority map for saccadeguidance and to track their influence over multiple timescales of interest

Methods

Animals and surgery

Two female adult monkeys (Macaca mulatta) aged 14and 15 years and identified as MAS15 and MAS16participated in these experiments MAS15 received anaseptic surgery to implant a subconjunctival wire searchcoil to record eye movements Eye movements ofMAS16 were measured using an infrared eye tracker(ISCAN Inc Woburn MA httpwwwiscaninccom)

Stimuli and eye-movement recordings

Monkeys viewed grayscale human-photographednatural scenes (Fernandes et al 2013 Phillips ampSegraves 2010) Each scene included a 28 middot 28 Gaborwavelet target with position chosen randomly across auniform distribution within the scene (Figure 1) TheGabor target was alpha-blended with the underlyingscene pixels giving it a transparency of 50 Sceneswere chosen from a library of 575 images in apseudorandom sequence and each scene was presentedin at most 10 trials with the Gabor target placed indifferent locations for each of those 10 trials

In each trial monkeys were given at most 20saccades to find the target Once the target was foundthey had to fixate for a minimum of 300 ms to receive awater reward of approximately 020ndash025 ml per trialBoth monkeys were highly familiar with searching innatural scenes and had done versions of our task forprevious studies earlier

We used the REX system (Hays Richmond ampOptican 1982) based on a PC computer running QNX(QNX Software Systems Ottawa Ontario Canada) a

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 2

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

real-time UNIX operating system for behavioralcontrol and eye position monitoring Visual stimuliwere generated by a second independent graphicsprocess (QNXndashPhoton) running on the same PC andrear-projected onto a tangent screen in front of themonkey by a CRT video projector (Sony VPH-D50 75Hz noninterlaced vertical scan rate 1024 middot 768resolution) The distance between the front of themonkeyrsquos eye and the screen was 109 cm The stimulispanned 488 middot 368 of visual field

Saccade detection

Saccade beginnings and endings were detected usingan in-house algorithm We used thresholds of 1008s forstart and stop velocities and marked a saccade startingtime when the velocity increased above this thresholdcontinuously for 10 ms Likewise saccade-ending timeswere marked when the velocity decreased continuouslyfor 10 ms and fell below 1008s at the end of this periodof decrease We considered saccades longer than 150 msas potentially resulting from eye-blinks or otherartifacts and discarded them Fixation locations werecomputed as the median (x y) gaze coordinate in theintersaccadic interval Since monkeys are head-fixedthey tend to make a few long-range saccades that areunlikely to occur in a naturalistic environment wherethe head is free To mitigate the potential effect of largesaccades we restricted our analysis to saccades under208 (we also reanalyzed the data with all saccades anddid not find any qualitative differences in our results)

Modeling peripheral visual acuity with a retinaltransform

It is well known that visual features cannot beresolved in the periphery with the same resolution as atthe point of gaze and that the representation of visualinformation in the brain is retinotopic Despite this themost influential models of eye gaze use visual featuresat fixation at full resolution and most bottom-upsaliency maps are computed in image-centered coordi-nates (eg Itti et al 1998) However when theoculomotor system computes a saliency map for eachsaccade it can only work with retinotopic representa-tions of local image statistics in the visual cortex Thusmodels of priority that take into account degradedrepresentations of peripheral information (eg Hoogeamp Erkelens 1999 Zelinsky et al 2013) and analysismethods that look for effects of visual features atdifferent saccade lengths might provide insight into thephenomena underlying the computation of priority inthe brain To model the peripheral sensitivity of theretina in a more realistic manner we implemented asimple blurring filter

The degraded image can be obtained as a parametricblurring of the original image with the spatial extent ofthe blur filter scaled by the distance from the previousfixation In practice we precomputed a Gaussianpyramid filter bank at three levels and convolved it withthe original image We then selected appropriatepyramids for each eccentricity according to theimplementation described in Geisler and Perry (1998)

First we transformed the natural image to a Gaussianpyramid An image pyramid is a collection of trans-

Figure 1 Experimental design Two monkeys were asked to search for 28 middot 28 Gabor-wavelet targets of either (A) vertical or (B)

horizontal orientation alpha-blended into natural scenes Red arrows (not displayed to the monkeys) indicate the locations of the

target These target locations were distributed randomly with a uniform distribution across the scene For each trial the monkeys

were permitted at most 20 fixations to locate the target

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 3

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

formed images each typically obtained by applying aconvolution to the original image Here we used a 3-levelGaussian pyramid with each level downsampled by afactor of 2 and convolved with a Gaussian filter(Supplementary Figure S1) We used the implementationprovided by the Matlab Pyramid Toolbox (httpwwwcnsnyuedulcvsoftwarephp) For each pyramid levelwe calculated the maximally resolvable spatial frequencyf as the Nyquist rate (half the sampling rate)

Next using the method by Geisler and Perry (1998)we calculated the critical eccentricity for a givenresolvable spatial frequency according to known retinalphysiology The critical eccentricity ec is given by

ec frac14e2

afln

1

CT0

e2 eth1THORN

where e2 is the half-resolution eccentricity a is the spatial

frequency decay constant f is the Nyquist rate and CT0

is the minimum contrast threshold We used the standardparameters of e2frac1423 afrac140106 and CT0frac14004 used byGeisler and Perry (1998)

We then swept through the image pixel-by-pixel andassigned the grayscale value of the pyramid level whosebest-resolvable frequency exceeds the one that can beresolved at the eccentricity of that pixel with respect toa given fixation location Once this retinal transform isapplied the detectability of high-frequency featuresfrom far away is clearly degraded (Figure 2)

Computation of visual features at fixation

To understand visual features that drive gaze behav-ior we computed a number of features from image

Figure 2 The effect of peripheral blurring on sensitivity to visual features Heat maps of saliency edgendashenergy and relevance are

overlaid on the retinally transformed images Left No blur filter Center and right Blur filter applied at two different fixation locations

indicated by the yellow crosshair

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 4

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches at fixated locations after peripherally blurringthem using a retinal transform The patch size specifiesthe size of the spotlight in the periphery over which visualfeatures are pooled prior to saccade decisions We chosepatch sizes of 18middot 18 28middot 28 and 48middot 48 and performedall analyses separately for each patch size

We computed one bottom-up feature (saliency) onetask-specific feature (relevance) and three low-levelorientation statistics (edgendashenergy orientedness andpredominant orientation)

Bottom-up saliency

Bottom-up saliency is the extent to which an imagepatch stands out with respect to its immediate neigh-borhood Typically saliency at a particular location is

operationally defined as the difference between a featurersquos(or set of features) value at that location and the featurersquos(or set of features) value in the immediately surroundingneighborhood This center-surround difference map istypically computed formultiple features atmultiple scalesand then summed together to obtain a saliency map (Ittiamp Koch 2001) A number of bottom-up saliency mapshave been proposed (see Borji Sihite amp Itti 2013 for asurvey) We used the most popular model (Itti et al1998) implemented in the Matlab Saliency Toolbox(wwwsaliencytoolboxnet Walther amp Koch 2006) witheight orientations and four scales (see Figure 3A)

Top-down relevance or target-similarity

The relevance of a fixated image patch must measurethe extent to which the information in the patch is

Figure 3 Computation of priority maps based on bottom-up saliency and top-down relevance The yellow crosshair represents the

fixation with respect to which the peripheral blurring operator is applied (A) Saliency Itti-Koch saliency (Itti et al 1998) computes

luminance contrast and orientation contrast at four different scales and eight different orientations and adds them across scales to

produce a saliency map (B) Relevance A Gabor-detector computes template Gabor filter outputs for a quadrature pair (in-phase

Gabor and its 908 phase shift) and takes the sum of squares to produce a relevance map Vertical and horizontal relevance maps are

computed for modeling the vertical and horizontal tasks respectively Only the vertical relevance map is shown here

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 5

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

relevant for the search task We assume a greedysearcher who looks at patches that maximally resem-bled the target

Before computing a similarity measure to the targetwe locally standardized the grayscale values of thepatch and then linearly stretched them to lie between 0and 1

Zethx yTHORN frac14 Iethx yTHORN lr

eth2THORN

Uethx yTHORN frac14 Zethx yTHORN minethZTHORNmaxethZTHORN minethZTHORN eth3THORN

We then defined target relevance of a fixated imagepatch as the sum of squares of the convolution of thefixated image patch with the target Gabor wavelet andits quadrature phase pair

R frac14 ethUGTHORN2 thorn ethUG0THORN2 eth4THORNwhere G and G0 are the Gabor target and its 908 phase-shifted version and represents two-dimensional con-volution The scalar relevance value that is used forsubsequent analysis is then the central pixel value of theconvolved image which happens to be equivalent to thesum of squares of dot products between the patch and thequadrature Gabor pairs for the 28 middot 28 image patch

For illustration the relevance maps visualized inFigures 2 and 3B are obtained by taking the sum ofsquares of the convolution between the target Gaborquadrature pairs and the full-sized contrast-normalizedimage

Orientation statistics

Aside from the composite measures of saliency andrelevance we extracted local orientation statistics usingan eigenvalue decomposition of the energy tensorcomputed as the covariance matrix of horizontal andvertical edge gradients at each pixel (Figure 4 Ganguliet al 2010)

Ixethx yTHORN frac14]Iethx yTHORN

]xeth5THORN

Iyethx yTHORN frac14]Iethx yTHORN

]yeth6THORN

Cethx yTHORN frac14 covethIx IxTHORNcovethIx IyTHORN

covethIx IyTHORNcovethIy IyTHORN

eth7THORN

where Ix and Iy are horizontal and vertical gradientsand C(x y) is the energy tensor Horizontal andvertical image gradients were computed as differencesbetween consecutive columns and rows of the imagerespectively From the eigenvalues of C(x y) k1 and

k2 we then computed the following orientationstatistics

The edgendashenergy of the patch is given by Efrac14k1thornk2the eigendirection h frac14 tan1[cov(Ix Ix) k2][cov(IxIy)] gives the predominant orientation and the ratio ofvariance along the eigendirection and its orthogonaldirection ffrac14 (k1 k2)(k1 thorn k2) gives the orientednessof the patch (0 for no orientation 1 for bars and edgesin any direction) We then computed measure forverticalness as tfrac14 jsin hj Verticalness is 1 for aperfectly vertical patch and 0 for a perfectly horizontalpatch

Analysis of fixation statistics

Our goal was to quantify the relative importanceof each visual statistic in predicting fixation behaviorand to study how these influences evolved overdifferent time scales The typical method to measurethe predictive power of a visual statistic is tocompare its value at fixated versus nonfixated pointsin the stimulus image However the stimuli arehuman-photographed scenes with objects of interestnear the center therefore fixated locations tend tobe closer to the center of the image rather thanrandomly sampled nonfixated locations This isknown as the center bias and can potentially inflateeffect size for a given statistic (eg Kanan TongZhang amp Cottrell 2009 Tseng Carmi CameronMunoz amp Itti 2009) To avoid center bias shuffled-control statistics are taken from the same location asthe current fixation but from a different image Inall subsequent analyses of visual statistics at fixationwe compared real visual features at fixation withshuffled controls computed as described above andalways excluded the very last fixation to avoidconfounds from the target itself

Fourier analysis

Local power spectra of natural image patches arehighly informative about scene texture and structure(Oliva amp Torralba 2001) and can be diagnostic ofnatural scene categories (Torralba amp Oliva 2003) Inparticular the two-dimensional Fast Fourier Trans-form (FFT) allows us to visualize Fourier amplitude atdifferent spatial frequencies and orientations at thesame time To obtain a qualitative understanding ofwhether visual statistics at fixation were biased by thesought-after Gabor target we studied the similaritybetween the target and fixated image patches byexamining their spectral properties (see eg Hwang etal 2009) We computed local Fourier amplitude(absolute value of the FFT) in 18 middot 18 28 middot 28 and 48 middot48 windows at each fixation using the real image and a

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 6

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

shuffle-control image We averaged the obtained power

spectra across fixations for each trial and then

computed a two-tailed t statistic across trials of the

difference in log absolute FFT between fixated patches

and shuffled-controls

Receiver operating characteristic analysis

To study the extent to which each visual featurecould predict gaze we performed a receiver operatingcharacteristic (ROC) curve analysis comparing thedistributions of fixated patches and shuffled controls

Figure 4 Orientation statistics For each natural scene we computed the local orientation statistics at all fixated and shuffled-control

patches edgendashenergy defined as the sum of squares of local gradients orientedness defined as a normalized ratio of eigenvalues of

the local orientation covariance matrix which measures the extent to which a single orientation is predominant and the predominant

orientation defined as the eigendirection Verticalness is defined as the absolute sine of the predominant orientation

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 7

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

over each visual feature We computed the area underthe ROC curve (AUC) for each feature and boot-strapped across fixations to obtain 95 confidenceintervals (CIs) for the AUCs

Analysis of visual feature influence within a trial

Although analysis of visual features at fixationsacross trials and sessions provide a general idea ofwhich features influence saccade planning they im-plicitly assume that the strategy of planning saccadesremains unchanged over time Since our goal is toobtain an insight into how search strategy evolves overtime we studied the difference between visual featuresat early and late fixations To understand how searchstrategy evolves within a single trial we computed theSpearmanrsquos rank correlation coefficient between dif-ferent visual features and two eye-movement variablesthe numbered order of fixation within a trial (fixationorder) and the saccade length made to reach thatfixation location along with their correspondingbootstrapped 95 confidence intervals

Analysis of visual feature analysis over sessions usingmultivariate decoding

Although ROC analysis provides a useful charac-terization of the influence of individual features it is ametric that compares two one-dimensional distribu-tions The AUC is a good measure of the importance ofa given variable but comparing two AUCs obtainedwith two different variables is not meaningful as ametric of relative importance if the variables arecorrelated However most features that inform bot-tom-up saliency are correlated with each other and withlocal contrast (Kienzle Franz Scholkopf amp Wich-mann 2009) Further in our task the Gabor-similaritymetric (relevance) may be correlated with orientationor edgendashenergy Therefore a large AUC for relevancecould merely result from the effect of a large AUC foredgendashenergy or vice versa

Multivariate analysis is a suitable tool for potentiallyexplaining away the effects of correlated variables Forinstance over the past few years generalized linearmodels (GLMs Nelder amp Wedderburn 1972) havebeen used to model spike trains when neural activitymay depend on multiple potentially correlated vari-ables (Fernandes et al 2013) Here we used aparticular GLM multivariate logistic regression todiscriminate fixated patches from shuffled controlsbased on the local visual features defined abovesaliency relevance edgendashenergy orientedness andverticalness defined as the sine of the predominantorientation

Using this decoding technique we estimated effectsizes for each feature using partial models leaving each

feature out of the full model comprising all featuresand comparing these partial models with the full model(see Appendix A for details on logistic regressiongoodness of fit characterization and effect sizeestimation) For this analysis we pooled fixationsacross four consecutive sessions for each search taskand each animal

Results

To understand the relative contributions of visualfeatures that guide saccades during naturalistic visionwe had two monkeys search for embedded Gaborwavelets in human-photographed natural scenesMonkeys performed the task over two sets of fourconsecutive days with the sets differing on theorientation of the target vertical in one set andhorizontal in the other We modeled the effects ofdecreasing retinal acuity as a function of eccentricityand analyzed visual features at fixated locations innatural scenes to ask how the features of the targetstimulus (task-relevant) as well as other features of thevisual scene (bottom-up) affect the search behavior Wethen studied how the influence of these visual featuresevolved across multiple time scales

Search performance

We first wanted to see if the monkeys couldsuccessfully perform the task that is if they couldsuccessfully find the target in a considerable number ofthe 700ndash1500 trials they completed on each of the eightdays We found that while one monkey (MAS15)performed better than the other (MAS16) as measuredby the fraction of successful trials (65 vs 41) and bythe mean number of fixations required to find thetarget (71 vs 83) both monkeys were able to find thetarget in each session (Supplementary Figure S2)Although there are differences in performance consis-tent with previously observed behavior of bothmonkeys (Kuo Kording amp Segraves 2012) they wereable to perform the task

Fourier analysis reveals the effect of edgendashenergy and target orientation

Gabor wavelets have a well-defined spectral signa-ture (Figure 5 first row) Thus we may reasonablyexpect saccade target selection to be guided by thelocal frequency content in natural scenes To get aninsight into this possibility we computed 2-D Fourierpower spectra at fixated and shuffled-control patches

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 8

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 2: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

target location (Ehinger Hidalgo-Sotelo amp Torralba2009) factors that are ecologically relevant but notspecific to the task such as intrinsic value (Gottlieb2012) and exploratory strategies for minimizinguncertainty (Gottlieb Oudeyer Lopes amp Baranes2013 Renninger Verghese amp Coughlan 2007) Inaddition change in direction between successivesaccades (Wilming Harst Schmidt amp Konig 2013)and natural statistics of saccade magnitude anddirection (Tatler amp Vincent 2009) also enable saccadeprediction Predictive models of eye movements arederived from priority maps comprising one or more ofthe above factors

Although systematic comparisons of factors influ-encing saccade guidance have been made in humans(eg Hwang Higgins amp Pomplun 2009 Navalpak-kam amp Itti 2006) similar studies are few and farbetween for nonhuman primates (NHPs BergBoehnke Marino Munoz amp Itti 2009 FernandesStevenson Phillips Segraves amp Kording 2013 Kanoamp Tomonaga 2009) Since NHPs are important modelorganisms for understanding complex tasks performedby the visual system it is important to rigorously modeltheir gaze behavior in naturalistic conditions Howeveralthough a multitude of factors constituting prioritymaps have been proposed in different contexts howthis notion of priority in the visual field evolves overtime has not been addressed Therefore our goal in thisstudy is to understand how search strategy in monkeysevolves over multiple time scales in naturalisticconditions ranging from a few seconds of exploring asingle image to learning to perform similar search tasksover several weeks

To this end we designed a natural search task inwhich two macaque monkeys searched for either avertical or a horizontal Gabor target placed randomlywithin human-photographed scenes To analyze gazebehavior from this task we adopted two criticalmethodological innovations First although it is wellknown that visual features in the periphery are notsensed with the same acuity as those at the fovea veryfew models of gaze have taken this into account(Zelinsky 2008) Therefore before computing visualfeatures at fixation we applied a simple computationalretinal transform centered on the previous fixation thatdegrades peripheral visual information in a retinallyrealistic manner (Geisler amp Perry 1998) Second wenote that the features that comprise priority maps areoften correlated for instance edges in natural scenestend to have both high luminance contrast and highenergy and the apparent influence of luminancecontrast might potentially be explained away by theinfluence of energy or vice versa Therefore to tellapart the relative influence of correlated features weapplied multivariate decoding analysis of fixated versus

nonfixated patches to quantify the influence of variousvisual features on gaze

Over the course of viewing a single image we foundthat monkeys fixated locations of greater saliencytarget similarity (relevance) edge-energy orientednessand verticalness later on in the trial Monkeys usedshort saccades to select locations scoring high in allfeatures with the exception of orientedness which wasselected for using long saccades We also found asignificant practice effect over several weeks withtarget orientation having a significant effect only in thelatter of two tasks performed Thus realistic modelingof peripheral vision and multivariate decoding allowedus to tease apart the relative influence of variousfeatures contributing to the priority map for saccadeguidance and to track their influence over multiple timescales of interest

Methods

Animals and surgery

Two female adult monkeys (Macaca mulatta) aged 14and 15 years and identified as MAS15 and MAS16participated in these experiments MAS15 received anaseptic surgery to implant a subconjunctival wire searchcoil to record eye movements Eye movements ofMAS16 were measured using an infrared eye tracker(ISCAN Inc Woburn MA httpwwwiscaninccom)

Stimuli and eye-movement recordings

Monkeys viewed grayscale human-photographednatural scenes (Fernandes et al 2013 Phillips ampSegraves 2010) Each scene included a 28 middot 28 Gaborwavelet target with position chosen randomly across auniform distribution within the scene (Figure 1) TheGabor target was alpha-blended with the underlyingscene pixels giving it a transparency of 50 Sceneswere chosen from a library of 575 images in apseudorandom sequence and each scene was presentedin at most 10 trials with the Gabor target placed indifferent locations for each of those 10 trials

In each trial monkeys were given at most 20saccades to find the target Once the target was foundthey had to fixate for a minimum of 300 ms to receive awater reward of approximately 020ndash025 ml per trialBoth monkeys were highly familiar with searching innatural scenes and had done versions of our task forprevious studies earlier

We used the REX system (Hays Richmond ampOptican 1982) based on a PC computer running QNX(QNX Software Systems Ottawa Ontario Canada) a

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 2

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

real-time UNIX operating system for behavioralcontrol and eye position monitoring Visual stimuliwere generated by a second independent graphicsprocess (QNXndashPhoton) running on the same PC andrear-projected onto a tangent screen in front of themonkey by a CRT video projector (Sony VPH-D50 75Hz noninterlaced vertical scan rate 1024 middot 768resolution) The distance between the front of themonkeyrsquos eye and the screen was 109 cm The stimulispanned 488 middot 368 of visual field

Saccade detection

Saccade beginnings and endings were detected usingan in-house algorithm We used thresholds of 1008s forstart and stop velocities and marked a saccade startingtime when the velocity increased above this thresholdcontinuously for 10 ms Likewise saccade-ending timeswere marked when the velocity decreased continuouslyfor 10 ms and fell below 1008s at the end of this periodof decrease We considered saccades longer than 150 msas potentially resulting from eye-blinks or otherartifacts and discarded them Fixation locations werecomputed as the median (x y) gaze coordinate in theintersaccadic interval Since monkeys are head-fixedthey tend to make a few long-range saccades that areunlikely to occur in a naturalistic environment wherethe head is free To mitigate the potential effect of largesaccades we restricted our analysis to saccades under208 (we also reanalyzed the data with all saccades anddid not find any qualitative differences in our results)

Modeling peripheral visual acuity with a retinaltransform

It is well known that visual features cannot beresolved in the periphery with the same resolution as atthe point of gaze and that the representation of visualinformation in the brain is retinotopic Despite this themost influential models of eye gaze use visual featuresat fixation at full resolution and most bottom-upsaliency maps are computed in image-centered coordi-nates (eg Itti et al 1998) However when theoculomotor system computes a saliency map for eachsaccade it can only work with retinotopic representa-tions of local image statistics in the visual cortex Thusmodels of priority that take into account degradedrepresentations of peripheral information (eg Hoogeamp Erkelens 1999 Zelinsky et al 2013) and analysismethods that look for effects of visual features atdifferent saccade lengths might provide insight into thephenomena underlying the computation of priority inthe brain To model the peripheral sensitivity of theretina in a more realistic manner we implemented asimple blurring filter

The degraded image can be obtained as a parametricblurring of the original image with the spatial extent ofthe blur filter scaled by the distance from the previousfixation In practice we precomputed a Gaussianpyramid filter bank at three levels and convolved it withthe original image We then selected appropriatepyramids for each eccentricity according to theimplementation described in Geisler and Perry (1998)

First we transformed the natural image to a Gaussianpyramid An image pyramid is a collection of trans-

Figure 1 Experimental design Two monkeys were asked to search for 28 middot 28 Gabor-wavelet targets of either (A) vertical or (B)

horizontal orientation alpha-blended into natural scenes Red arrows (not displayed to the monkeys) indicate the locations of the

target These target locations were distributed randomly with a uniform distribution across the scene For each trial the monkeys

were permitted at most 20 fixations to locate the target

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 3

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

formed images each typically obtained by applying aconvolution to the original image Here we used a 3-levelGaussian pyramid with each level downsampled by afactor of 2 and convolved with a Gaussian filter(Supplementary Figure S1) We used the implementationprovided by the Matlab Pyramid Toolbox (httpwwwcnsnyuedulcvsoftwarephp) For each pyramid levelwe calculated the maximally resolvable spatial frequencyf as the Nyquist rate (half the sampling rate)

Next using the method by Geisler and Perry (1998)we calculated the critical eccentricity for a givenresolvable spatial frequency according to known retinalphysiology The critical eccentricity ec is given by

ec frac14e2

afln

1

CT0

e2 eth1THORN

where e2 is the half-resolution eccentricity a is the spatial

frequency decay constant f is the Nyquist rate and CT0

is the minimum contrast threshold We used the standardparameters of e2frac1423 afrac140106 and CT0frac14004 used byGeisler and Perry (1998)

We then swept through the image pixel-by-pixel andassigned the grayscale value of the pyramid level whosebest-resolvable frequency exceeds the one that can beresolved at the eccentricity of that pixel with respect toa given fixation location Once this retinal transform isapplied the detectability of high-frequency featuresfrom far away is clearly degraded (Figure 2)

Computation of visual features at fixation

To understand visual features that drive gaze behav-ior we computed a number of features from image

Figure 2 The effect of peripheral blurring on sensitivity to visual features Heat maps of saliency edgendashenergy and relevance are

overlaid on the retinally transformed images Left No blur filter Center and right Blur filter applied at two different fixation locations

indicated by the yellow crosshair

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 4

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches at fixated locations after peripherally blurringthem using a retinal transform The patch size specifiesthe size of the spotlight in the periphery over which visualfeatures are pooled prior to saccade decisions We chosepatch sizes of 18middot 18 28middot 28 and 48middot 48 and performedall analyses separately for each patch size

We computed one bottom-up feature (saliency) onetask-specific feature (relevance) and three low-levelorientation statistics (edgendashenergy orientedness andpredominant orientation)

Bottom-up saliency

Bottom-up saliency is the extent to which an imagepatch stands out with respect to its immediate neigh-borhood Typically saliency at a particular location is

operationally defined as the difference between a featurersquos(or set of features) value at that location and the featurersquos(or set of features) value in the immediately surroundingneighborhood This center-surround difference map istypically computed formultiple features atmultiple scalesand then summed together to obtain a saliency map (Ittiamp Koch 2001) A number of bottom-up saliency mapshave been proposed (see Borji Sihite amp Itti 2013 for asurvey) We used the most popular model (Itti et al1998) implemented in the Matlab Saliency Toolbox(wwwsaliencytoolboxnet Walther amp Koch 2006) witheight orientations and four scales (see Figure 3A)

Top-down relevance or target-similarity

The relevance of a fixated image patch must measurethe extent to which the information in the patch is

Figure 3 Computation of priority maps based on bottom-up saliency and top-down relevance The yellow crosshair represents the

fixation with respect to which the peripheral blurring operator is applied (A) Saliency Itti-Koch saliency (Itti et al 1998) computes

luminance contrast and orientation contrast at four different scales and eight different orientations and adds them across scales to

produce a saliency map (B) Relevance A Gabor-detector computes template Gabor filter outputs for a quadrature pair (in-phase

Gabor and its 908 phase shift) and takes the sum of squares to produce a relevance map Vertical and horizontal relevance maps are

computed for modeling the vertical and horizontal tasks respectively Only the vertical relevance map is shown here

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 5

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

relevant for the search task We assume a greedysearcher who looks at patches that maximally resem-bled the target

Before computing a similarity measure to the targetwe locally standardized the grayscale values of thepatch and then linearly stretched them to lie between 0and 1

Zethx yTHORN frac14 Iethx yTHORN lr

eth2THORN

Uethx yTHORN frac14 Zethx yTHORN minethZTHORNmaxethZTHORN minethZTHORN eth3THORN

We then defined target relevance of a fixated imagepatch as the sum of squares of the convolution of thefixated image patch with the target Gabor wavelet andits quadrature phase pair

R frac14 ethUGTHORN2 thorn ethUG0THORN2 eth4THORNwhere G and G0 are the Gabor target and its 908 phase-shifted version and represents two-dimensional con-volution The scalar relevance value that is used forsubsequent analysis is then the central pixel value of theconvolved image which happens to be equivalent to thesum of squares of dot products between the patch and thequadrature Gabor pairs for the 28 middot 28 image patch

For illustration the relevance maps visualized inFigures 2 and 3B are obtained by taking the sum ofsquares of the convolution between the target Gaborquadrature pairs and the full-sized contrast-normalizedimage

Orientation statistics

Aside from the composite measures of saliency andrelevance we extracted local orientation statistics usingan eigenvalue decomposition of the energy tensorcomputed as the covariance matrix of horizontal andvertical edge gradients at each pixel (Figure 4 Ganguliet al 2010)

Ixethx yTHORN frac14]Iethx yTHORN

]xeth5THORN

Iyethx yTHORN frac14]Iethx yTHORN

]yeth6THORN

Cethx yTHORN frac14 covethIx IxTHORNcovethIx IyTHORN

covethIx IyTHORNcovethIy IyTHORN

eth7THORN

where Ix and Iy are horizontal and vertical gradientsand C(x y) is the energy tensor Horizontal andvertical image gradients were computed as differencesbetween consecutive columns and rows of the imagerespectively From the eigenvalues of C(x y) k1 and

k2 we then computed the following orientationstatistics

The edgendashenergy of the patch is given by Efrac14k1thornk2the eigendirection h frac14 tan1[cov(Ix Ix) k2][cov(IxIy)] gives the predominant orientation and the ratio ofvariance along the eigendirection and its orthogonaldirection ffrac14 (k1 k2)(k1 thorn k2) gives the orientednessof the patch (0 for no orientation 1 for bars and edgesin any direction) We then computed measure forverticalness as tfrac14 jsin hj Verticalness is 1 for aperfectly vertical patch and 0 for a perfectly horizontalpatch

Analysis of fixation statistics

Our goal was to quantify the relative importanceof each visual statistic in predicting fixation behaviorand to study how these influences evolved overdifferent time scales The typical method to measurethe predictive power of a visual statistic is tocompare its value at fixated versus nonfixated pointsin the stimulus image However the stimuli arehuman-photographed scenes with objects of interestnear the center therefore fixated locations tend tobe closer to the center of the image rather thanrandomly sampled nonfixated locations This isknown as the center bias and can potentially inflateeffect size for a given statistic (eg Kanan TongZhang amp Cottrell 2009 Tseng Carmi CameronMunoz amp Itti 2009) To avoid center bias shuffled-control statistics are taken from the same location asthe current fixation but from a different image Inall subsequent analyses of visual statistics at fixationwe compared real visual features at fixation withshuffled controls computed as described above andalways excluded the very last fixation to avoidconfounds from the target itself

Fourier analysis

Local power spectra of natural image patches arehighly informative about scene texture and structure(Oliva amp Torralba 2001) and can be diagnostic ofnatural scene categories (Torralba amp Oliva 2003) Inparticular the two-dimensional Fast Fourier Trans-form (FFT) allows us to visualize Fourier amplitude atdifferent spatial frequencies and orientations at thesame time To obtain a qualitative understanding ofwhether visual statistics at fixation were biased by thesought-after Gabor target we studied the similaritybetween the target and fixated image patches byexamining their spectral properties (see eg Hwang etal 2009) We computed local Fourier amplitude(absolute value of the FFT) in 18 middot 18 28 middot 28 and 48 middot48 windows at each fixation using the real image and a

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 6

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

shuffle-control image We averaged the obtained power

spectra across fixations for each trial and then

computed a two-tailed t statistic across trials of the

difference in log absolute FFT between fixated patches

and shuffled-controls

Receiver operating characteristic analysis

To study the extent to which each visual featurecould predict gaze we performed a receiver operatingcharacteristic (ROC) curve analysis comparing thedistributions of fixated patches and shuffled controls

Figure 4 Orientation statistics For each natural scene we computed the local orientation statistics at all fixated and shuffled-control

patches edgendashenergy defined as the sum of squares of local gradients orientedness defined as a normalized ratio of eigenvalues of

the local orientation covariance matrix which measures the extent to which a single orientation is predominant and the predominant

orientation defined as the eigendirection Verticalness is defined as the absolute sine of the predominant orientation

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 7

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

over each visual feature We computed the area underthe ROC curve (AUC) for each feature and boot-strapped across fixations to obtain 95 confidenceintervals (CIs) for the AUCs

Analysis of visual feature influence within a trial

Although analysis of visual features at fixationsacross trials and sessions provide a general idea ofwhich features influence saccade planning they im-plicitly assume that the strategy of planning saccadesremains unchanged over time Since our goal is toobtain an insight into how search strategy evolves overtime we studied the difference between visual featuresat early and late fixations To understand how searchstrategy evolves within a single trial we computed theSpearmanrsquos rank correlation coefficient between dif-ferent visual features and two eye-movement variablesthe numbered order of fixation within a trial (fixationorder) and the saccade length made to reach thatfixation location along with their correspondingbootstrapped 95 confidence intervals

Analysis of visual feature analysis over sessions usingmultivariate decoding

Although ROC analysis provides a useful charac-terization of the influence of individual features it is ametric that compares two one-dimensional distribu-tions The AUC is a good measure of the importance ofa given variable but comparing two AUCs obtainedwith two different variables is not meaningful as ametric of relative importance if the variables arecorrelated However most features that inform bot-tom-up saliency are correlated with each other and withlocal contrast (Kienzle Franz Scholkopf amp Wich-mann 2009) Further in our task the Gabor-similaritymetric (relevance) may be correlated with orientationor edgendashenergy Therefore a large AUC for relevancecould merely result from the effect of a large AUC foredgendashenergy or vice versa

Multivariate analysis is a suitable tool for potentiallyexplaining away the effects of correlated variables Forinstance over the past few years generalized linearmodels (GLMs Nelder amp Wedderburn 1972) havebeen used to model spike trains when neural activitymay depend on multiple potentially correlated vari-ables (Fernandes et al 2013) Here we used aparticular GLM multivariate logistic regression todiscriminate fixated patches from shuffled controlsbased on the local visual features defined abovesaliency relevance edgendashenergy orientedness andverticalness defined as the sine of the predominantorientation

Using this decoding technique we estimated effectsizes for each feature using partial models leaving each

feature out of the full model comprising all featuresand comparing these partial models with the full model(see Appendix A for details on logistic regressiongoodness of fit characterization and effect sizeestimation) For this analysis we pooled fixationsacross four consecutive sessions for each search taskand each animal

Results

To understand the relative contributions of visualfeatures that guide saccades during naturalistic visionwe had two monkeys search for embedded Gaborwavelets in human-photographed natural scenesMonkeys performed the task over two sets of fourconsecutive days with the sets differing on theorientation of the target vertical in one set andhorizontal in the other We modeled the effects ofdecreasing retinal acuity as a function of eccentricityand analyzed visual features at fixated locations innatural scenes to ask how the features of the targetstimulus (task-relevant) as well as other features of thevisual scene (bottom-up) affect the search behavior Wethen studied how the influence of these visual featuresevolved across multiple time scales

Search performance

We first wanted to see if the monkeys couldsuccessfully perform the task that is if they couldsuccessfully find the target in a considerable number ofthe 700ndash1500 trials they completed on each of the eightdays We found that while one monkey (MAS15)performed better than the other (MAS16) as measuredby the fraction of successful trials (65 vs 41) and bythe mean number of fixations required to find thetarget (71 vs 83) both monkeys were able to find thetarget in each session (Supplementary Figure S2)Although there are differences in performance consis-tent with previously observed behavior of bothmonkeys (Kuo Kording amp Segraves 2012) they wereable to perform the task

Fourier analysis reveals the effect of edgendashenergy and target orientation

Gabor wavelets have a well-defined spectral signa-ture (Figure 5 first row) Thus we may reasonablyexpect saccade target selection to be guided by thelocal frequency content in natural scenes To get aninsight into this possibility we computed 2-D Fourierpower spectra at fixated and shuffled-control patches

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 8

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 3: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

real-time UNIX operating system for behavioralcontrol and eye position monitoring Visual stimuliwere generated by a second independent graphicsprocess (QNXndashPhoton) running on the same PC andrear-projected onto a tangent screen in front of themonkey by a CRT video projector (Sony VPH-D50 75Hz noninterlaced vertical scan rate 1024 middot 768resolution) The distance between the front of themonkeyrsquos eye and the screen was 109 cm The stimulispanned 488 middot 368 of visual field

Saccade detection

Saccade beginnings and endings were detected usingan in-house algorithm We used thresholds of 1008s forstart and stop velocities and marked a saccade startingtime when the velocity increased above this thresholdcontinuously for 10 ms Likewise saccade-ending timeswere marked when the velocity decreased continuouslyfor 10 ms and fell below 1008s at the end of this periodof decrease We considered saccades longer than 150 msas potentially resulting from eye-blinks or otherartifacts and discarded them Fixation locations werecomputed as the median (x y) gaze coordinate in theintersaccadic interval Since monkeys are head-fixedthey tend to make a few long-range saccades that areunlikely to occur in a naturalistic environment wherethe head is free To mitigate the potential effect of largesaccades we restricted our analysis to saccades under208 (we also reanalyzed the data with all saccades anddid not find any qualitative differences in our results)

Modeling peripheral visual acuity with a retinaltransform

It is well known that visual features cannot beresolved in the periphery with the same resolution as atthe point of gaze and that the representation of visualinformation in the brain is retinotopic Despite this themost influential models of eye gaze use visual featuresat fixation at full resolution and most bottom-upsaliency maps are computed in image-centered coordi-nates (eg Itti et al 1998) However when theoculomotor system computes a saliency map for eachsaccade it can only work with retinotopic representa-tions of local image statistics in the visual cortex Thusmodels of priority that take into account degradedrepresentations of peripheral information (eg Hoogeamp Erkelens 1999 Zelinsky et al 2013) and analysismethods that look for effects of visual features atdifferent saccade lengths might provide insight into thephenomena underlying the computation of priority inthe brain To model the peripheral sensitivity of theretina in a more realistic manner we implemented asimple blurring filter

The degraded image can be obtained as a parametricblurring of the original image with the spatial extent ofthe blur filter scaled by the distance from the previousfixation In practice we precomputed a Gaussianpyramid filter bank at three levels and convolved it withthe original image We then selected appropriatepyramids for each eccentricity according to theimplementation described in Geisler and Perry (1998)

First we transformed the natural image to a Gaussianpyramid An image pyramid is a collection of trans-

Figure 1 Experimental design Two monkeys were asked to search for 28 middot 28 Gabor-wavelet targets of either (A) vertical or (B)

horizontal orientation alpha-blended into natural scenes Red arrows (not displayed to the monkeys) indicate the locations of the

target These target locations were distributed randomly with a uniform distribution across the scene For each trial the monkeys

were permitted at most 20 fixations to locate the target

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 3

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

formed images each typically obtained by applying aconvolution to the original image Here we used a 3-levelGaussian pyramid with each level downsampled by afactor of 2 and convolved with a Gaussian filter(Supplementary Figure S1) We used the implementationprovided by the Matlab Pyramid Toolbox (httpwwwcnsnyuedulcvsoftwarephp) For each pyramid levelwe calculated the maximally resolvable spatial frequencyf as the Nyquist rate (half the sampling rate)

Next using the method by Geisler and Perry (1998)we calculated the critical eccentricity for a givenresolvable spatial frequency according to known retinalphysiology The critical eccentricity ec is given by

ec frac14e2

afln

1

CT0

e2 eth1THORN

where e2 is the half-resolution eccentricity a is the spatial

frequency decay constant f is the Nyquist rate and CT0

is the minimum contrast threshold We used the standardparameters of e2frac1423 afrac140106 and CT0frac14004 used byGeisler and Perry (1998)

We then swept through the image pixel-by-pixel andassigned the grayscale value of the pyramid level whosebest-resolvable frequency exceeds the one that can beresolved at the eccentricity of that pixel with respect toa given fixation location Once this retinal transform isapplied the detectability of high-frequency featuresfrom far away is clearly degraded (Figure 2)

Computation of visual features at fixation

To understand visual features that drive gaze behav-ior we computed a number of features from image

Figure 2 The effect of peripheral blurring on sensitivity to visual features Heat maps of saliency edgendashenergy and relevance are

overlaid on the retinally transformed images Left No blur filter Center and right Blur filter applied at two different fixation locations

indicated by the yellow crosshair

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 4

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches at fixated locations after peripherally blurringthem using a retinal transform The patch size specifiesthe size of the spotlight in the periphery over which visualfeatures are pooled prior to saccade decisions We chosepatch sizes of 18middot 18 28middot 28 and 48middot 48 and performedall analyses separately for each patch size

We computed one bottom-up feature (saliency) onetask-specific feature (relevance) and three low-levelorientation statistics (edgendashenergy orientedness andpredominant orientation)

Bottom-up saliency

Bottom-up saliency is the extent to which an imagepatch stands out with respect to its immediate neigh-borhood Typically saliency at a particular location is

operationally defined as the difference between a featurersquos(or set of features) value at that location and the featurersquos(or set of features) value in the immediately surroundingneighborhood This center-surround difference map istypically computed formultiple features atmultiple scalesand then summed together to obtain a saliency map (Ittiamp Koch 2001) A number of bottom-up saliency mapshave been proposed (see Borji Sihite amp Itti 2013 for asurvey) We used the most popular model (Itti et al1998) implemented in the Matlab Saliency Toolbox(wwwsaliencytoolboxnet Walther amp Koch 2006) witheight orientations and four scales (see Figure 3A)

Top-down relevance or target-similarity

The relevance of a fixated image patch must measurethe extent to which the information in the patch is

Figure 3 Computation of priority maps based on bottom-up saliency and top-down relevance The yellow crosshair represents the

fixation with respect to which the peripheral blurring operator is applied (A) Saliency Itti-Koch saliency (Itti et al 1998) computes

luminance contrast and orientation contrast at four different scales and eight different orientations and adds them across scales to

produce a saliency map (B) Relevance A Gabor-detector computes template Gabor filter outputs for a quadrature pair (in-phase

Gabor and its 908 phase shift) and takes the sum of squares to produce a relevance map Vertical and horizontal relevance maps are

computed for modeling the vertical and horizontal tasks respectively Only the vertical relevance map is shown here

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 5

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

relevant for the search task We assume a greedysearcher who looks at patches that maximally resem-bled the target

Before computing a similarity measure to the targetwe locally standardized the grayscale values of thepatch and then linearly stretched them to lie between 0and 1

Zethx yTHORN frac14 Iethx yTHORN lr

eth2THORN

Uethx yTHORN frac14 Zethx yTHORN minethZTHORNmaxethZTHORN minethZTHORN eth3THORN

We then defined target relevance of a fixated imagepatch as the sum of squares of the convolution of thefixated image patch with the target Gabor wavelet andits quadrature phase pair

R frac14 ethUGTHORN2 thorn ethUG0THORN2 eth4THORNwhere G and G0 are the Gabor target and its 908 phase-shifted version and represents two-dimensional con-volution The scalar relevance value that is used forsubsequent analysis is then the central pixel value of theconvolved image which happens to be equivalent to thesum of squares of dot products between the patch and thequadrature Gabor pairs for the 28 middot 28 image patch

For illustration the relevance maps visualized inFigures 2 and 3B are obtained by taking the sum ofsquares of the convolution between the target Gaborquadrature pairs and the full-sized contrast-normalizedimage

Orientation statistics

Aside from the composite measures of saliency andrelevance we extracted local orientation statistics usingan eigenvalue decomposition of the energy tensorcomputed as the covariance matrix of horizontal andvertical edge gradients at each pixel (Figure 4 Ganguliet al 2010)

Ixethx yTHORN frac14]Iethx yTHORN

]xeth5THORN

Iyethx yTHORN frac14]Iethx yTHORN

]yeth6THORN

Cethx yTHORN frac14 covethIx IxTHORNcovethIx IyTHORN

covethIx IyTHORNcovethIy IyTHORN

eth7THORN

where Ix and Iy are horizontal and vertical gradientsand C(x y) is the energy tensor Horizontal andvertical image gradients were computed as differencesbetween consecutive columns and rows of the imagerespectively From the eigenvalues of C(x y) k1 and

k2 we then computed the following orientationstatistics

The edgendashenergy of the patch is given by Efrac14k1thornk2the eigendirection h frac14 tan1[cov(Ix Ix) k2][cov(IxIy)] gives the predominant orientation and the ratio ofvariance along the eigendirection and its orthogonaldirection ffrac14 (k1 k2)(k1 thorn k2) gives the orientednessof the patch (0 for no orientation 1 for bars and edgesin any direction) We then computed measure forverticalness as tfrac14 jsin hj Verticalness is 1 for aperfectly vertical patch and 0 for a perfectly horizontalpatch

Analysis of fixation statistics

Our goal was to quantify the relative importanceof each visual statistic in predicting fixation behaviorand to study how these influences evolved overdifferent time scales The typical method to measurethe predictive power of a visual statistic is tocompare its value at fixated versus nonfixated pointsin the stimulus image However the stimuli arehuman-photographed scenes with objects of interestnear the center therefore fixated locations tend tobe closer to the center of the image rather thanrandomly sampled nonfixated locations This isknown as the center bias and can potentially inflateeffect size for a given statistic (eg Kanan TongZhang amp Cottrell 2009 Tseng Carmi CameronMunoz amp Itti 2009) To avoid center bias shuffled-control statistics are taken from the same location asthe current fixation but from a different image Inall subsequent analyses of visual statistics at fixationwe compared real visual features at fixation withshuffled controls computed as described above andalways excluded the very last fixation to avoidconfounds from the target itself

Fourier analysis

Local power spectra of natural image patches arehighly informative about scene texture and structure(Oliva amp Torralba 2001) and can be diagnostic ofnatural scene categories (Torralba amp Oliva 2003) Inparticular the two-dimensional Fast Fourier Trans-form (FFT) allows us to visualize Fourier amplitude atdifferent spatial frequencies and orientations at thesame time To obtain a qualitative understanding ofwhether visual statistics at fixation were biased by thesought-after Gabor target we studied the similaritybetween the target and fixated image patches byexamining their spectral properties (see eg Hwang etal 2009) We computed local Fourier amplitude(absolute value of the FFT) in 18 middot 18 28 middot 28 and 48 middot48 windows at each fixation using the real image and a

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 6

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

shuffle-control image We averaged the obtained power

spectra across fixations for each trial and then

computed a two-tailed t statistic across trials of the

difference in log absolute FFT between fixated patches

and shuffled-controls

Receiver operating characteristic analysis

To study the extent to which each visual featurecould predict gaze we performed a receiver operatingcharacteristic (ROC) curve analysis comparing thedistributions of fixated patches and shuffled controls

Figure 4 Orientation statistics For each natural scene we computed the local orientation statistics at all fixated and shuffled-control

patches edgendashenergy defined as the sum of squares of local gradients orientedness defined as a normalized ratio of eigenvalues of

the local orientation covariance matrix which measures the extent to which a single orientation is predominant and the predominant

orientation defined as the eigendirection Verticalness is defined as the absolute sine of the predominant orientation

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 7

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

over each visual feature We computed the area underthe ROC curve (AUC) for each feature and boot-strapped across fixations to obtain 95 confidenceintervals (CIs) for the AUCs

Analysis of visual feature influence within a trial

Although analysis of visual features at fixationsacross trials and sessions provide a general idea ofwhich features influence saccade planning they im-plicitly assume that the strategy of planning saccadesremains unchanged over time Since our goal is toobtain an insight into how search strategy evolves overtime we studied the difference between visual featuresat early and late fixations To understand how searchstrategy evolves within a single trial we computed theSpearmanrsquos rank correlation coefficient between dif-ferent visual features and two eye-movement variablesthe numbered order of fixation within a trial (fixationorder) and the saccade length made to reach thatfixation location along with their correspondingbootstrapped 95 confidence intervals

Analysis of visual feature analysis over sessions usingmultivariate decoding

Although ROC analysis provides a useful charac-terization of the influence of individual features it is ametric that compares two one-dimensional distribu-tions The AUC is a good measure of the importance ofa given variable but comparing two AUCs obtainedwith two different variables is not meaningful as ametric of relative importance if the variables arecorrelated However most features that inform bot-tom-up saliency are correlated with each other and withlocal contrast (Kienzle Franz Scholkopf amp Wich-mann 2009) Further in our task the Gabor-similaritymetric (relevance) may be correlated with orientationor edgendashenergy Therefore a large AUC for relevancecould merely result from the effect of a large AUC foredgendashenergy or vice versa

Multivariate analysis is a suitable tool for potentiallyexplaining away the effects of correlated variables Forinstance over the past few years generalized linearmodels (GLMs Nelder amp Wedderburn 1972) havebeen used to model spike trains when neural activitymay depend on multiple potentially correlated vari-ables (Fernandes et al 2013) Here we used aparticular GLM multivariate logistic regression todiscriminate fixated patches from shuffled controlsbased on the local visual features defined abovesaliency relevance edgendashenergy orientedness andverticalness defined as the sine of the predominantorientation

Using this decoding technique we estimated effectsizes for each feature using partial models leaving each

feature out of the full model comprising all featuresand comparing these partial models with the full model(see Appendix A for details on logistic regressiongoodness of fit characterization and effect sizeestimation) For this analysis we pooled fixationsacross four consecutive sessions for each search taskand each animal

Results

To understand the relative contributions of visualfeatures that guide saccades during naturalistic visionwe had two monkeys search for embedded Gaborwavelets in human-photographed natural scenesMonkeys performed the task over two sets of fourconsecutive days with the sets differing on theorientation of the target vertical in one set andhorizontal in the other We modeled the effects ofdecreasing retinal acuity as a function of eccentricityand analyzed visual features at fixated locations innatural scenes to ask how the features of the targetstimulus (task-relevant) as well as other features of thevisual scene (bottom-up) affect the search behavior Wethen studied how the influence of these visual featuresevolved across multiple time scales

Search performance

We first wanted to see if the monkeys couldsuccessfully perform the task that is if they couldsuccessfully find the target in a considerable number ofthe 700ndash1500 trials they completed on each of the eightdays We found that while one monkey (MAS15)performed better than the other (MAS16) as measuredby the fraction of successful trials (65 vs 41) and bythe mean number of fixations required to find thetarget (71 vs 83) both monkeys were able to find thetarget in each session (Supplementary Figure S2)Although there are differences in performance consis-tent with previously observed behavior of bothmonkeys (Kuo Kording amp Segraves 2012) they wereable to perform the task

Fourier analysis reveals the effect of edgendashenergy and target orientation

Gabor wavelets have a well-defined spectral signa-ture (Figure 5 first row) Thus we may reasonablyexpect saccade target selection to be guided by thelocal frequency content in natural scenes To get aninsight into this possibility we computed 2-D Fourierpower spectra at fixated and shuffled-control patches

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 8

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 4: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

formed images each typically obtained by applying aconvolution to the original image Here we used a 3-levelGaussian pyramid with each level downsampled by afactor of 2 and convolved with a Gaussian filter(Supplementary Figure S1) We used the implementationprovided by the Matlab Pyramid Toolbox (httpwwwcnsnyuedulcvsoftwarephp) For each pyramid levelwe calculated the maximally resolvable spatial frequencyf as the Nyquist rate (half the sampling rate)

Next using the method by Geisler and Perry (1998)we calculated the critical eccentricity for a givenresolvable spatial frequency according to known retinalphysiology The critical eccentricity ec is given by

ec frac14e2

afln

1

CT0

e2 eth1THORN

where e2 is the half-resolution eccentricity a is the spatial

frequency decay constant f is the Nyquist rate and CT0

is the minimum contrast threshold We used the standardparameters of e2frac1423 afrac140106 and CT0frac14004 used byGeisler and Perry (1998)

We then swept through the image pixel-by-pixel andassigned the grayscale value of the pyramid level whosebest-resolvable frequency exceeds the one that can beresolved at the eccentricity of that pixel with respect toa given fixation location Once this retinal transform isapplied the detectability of high-frequency featuresfrom far away is clearly degraded (Figure 2)

Computation of visual features at fixation

To understand visual features that drive gaze behav-ior we computed a number of features from image

Figure 2 The effect of peripheral blurring on sensitivity to visual features Heat maps of saliency edgendashenergy and relevance are

overlaid on the retinally transformed images Left No blur filter Center and right Blur filter applied at two different fixation locations

indicated by the yellow crosshair

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 4

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches at fixated locations after peripherally blurringthem using a retinal transform The patch size specifiesthe size of the spotlight in the periphery over which visualfeatures are pooled prior to saccade decisions We chosepatch sizes of 18middot 18 28middot 28 and 48middot 48 and performedall analyses separately for each patch size

We computed one bottom-up feature (saliency) onetask-specific feature (relevance) and three low-levelorientation statistics (edgendashenergy orientedness andpredominant orientation)

Bottom-up saliency

Bottom-up saliency is the extent to which an imagepatch stands out with respect to its immediate neigh-borhood Typically saliency at a particular location is

operationally defined as the difference between a featurersquos(or set of features) value at that location and the featurersquos(or set of features) value in the immediately surroundingneighborhood This center-surround difference map istypically computed formultiple features atmultiple scalesand then summed together to obtain a saliency map (Ittiamp Koch 2001) A number of bottom-up saliency mapshave been proposed (see Borji Sihite amp Itti 2013 for asurvey) We used the most popular model (Itti et al1998) implemented in the Matlab Saliency Toolbox(wwwsaliencytoolboxnet Walther amp Koch 2006) witheight orientations and four scales (see Figure 3A)

Top-down relevance or target-similarity

The relevance of a fixated image patch must measurethe extent to which the information in the patch is

Figure 3 Computation of priority maps based on bottom-up saliency and top-down relevance The yellow crosshair represents the

fixation with respect to which the peripheral blurring operator is applied (A) Saliency Itti-Koch saliency (Itti et al 1998) computes

luminance contrast and orientation contrast at four different scales and eight different orientations and adds them across scales to

produce a saliency map (B) Relevance A Gabor-detector computes template Gabor filter outputs for a quadrature pair (in-phase

Gabor and its 908 phase shift) and takes the sum of squares to produce a relevance map Vertical and horizontal relevance maps are

computed for modeling the vertical and horizontal tasks respectively Only the vertical relevance map is shown here

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 5

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

relevant for the search task We assume a greedysearcher who looks at patches that maximally resem-bled the target

Before computing a similarity measure to the targetwe locally standardized the grayscale values of thepatch and then linearly stretched them to lie between 0and 1

Zethx yTHORN frac14 Iethx yTHORN lr

eth2THORN

Uethx yTHORN frac14 Zethx yTHORN minethZTHORNmaxethZTHORN minethZTHORN eth3THORN

We then defined target relevance of a fixated imagepatch as the sum of squares of the convolution of thefixated image patch with the target Gabor wavelet andits quadrature phase pair

R frac14 ethUGTHORN2 thorn ethUG0THORN2 eth4THORNwhere G and G0 are the Gabor target and its 908 phase-shifted version and represents two-dimensional con-volution The scalar relevance value that is used forsubsequent analysis is then the central pixel value of theconvolved image which happens to be equivalent to thesum of squares of dot products between the patch and thequadrature Gabor pairs for the 28 middot 28 image patch

For illustration the relevance maps visualized inFigures 2 and 3B are obtained by taking the sum ofsquares of the convolution between the target Gaborquadrature pairs and the full-sized contrast-normalizedimage

Orientation statistics

Aside from the composite measures of saliency andrelevance we extracted local orientation statistics usingan eigenvalue decomposition of the energy tensorcomputed as the covariance matrix of horizontal andvertical edge gradients at each pixel (Figure 4 Ganguliet al 2010)

Ixethx yTHORN frac14]Iethx yTHORN

]xeth5THORN

Iyethx yTHORN frac14]Iethx yTHORN

]yeth6THORN

Cethx yTHORN frac14 covethIx IxTHORNcovethIx IyTHORN

covethIx IyTHORNcovethIy IyTHORN

eth7THORN

where Ix and Iy are horizontal and vertical gradientsand C(x y) is the energy tensor Horizontal andvertical image gradients were computed as differencesbetween consecutive columns and rows of the imagerespectively From the eigenvalues of C(x y) k1 and

k2 we then computed the following orientationstatistics

The edgendashenergy of the patch is given by Efrac14k1thornk2the eigendirection h frac14 tan1[cov(Ix Ix) k2][cov(IxIy)] gives the predominant orientation and the ratio ofvariance along the eigendirection and its orthogonaldirection ffrac14 (k1 k2)(k1 thorn k2) gives the orientednessof the patch (0 for no orientation 1 for bars and edgesin any direction) We then computed measure forverticalness as tfrac14 jsin hj Verticalness is 1 for aperfectly vertical patch and 0 for a perfectly horizontalpatch

Analysis of fixation statistics

Our goal was to quantify the relative importanceof each visual statistic in predicting fixation behaviorand to study how these influences evolved overdifferent time scales The typical method to measurethe predictive power of a visual statistic is tocompare its value at fixated versus nonfixated pointsin the stimulus image However the stimuli arehuman-photographed scenes with objects of interestnear the center therefore fixated locations tend tobe closer to the center of the image rather thanrandomly sampled nonfixated locations This isknown as the center bias and can potentially inflateeffect size for a given statistic (eg Kanan TongZhang amp Cottrell 2009 Tseng Carmi CameronMunoz amp Itti 2009) To avoid center bias shuffled-control statistics are taken from the same location asthe current fixation but from a different image Inall subsequent analyses of visual statistics at fixationwe compared real visual features at fixation withshuffled controls computed as described above andalways excluded the very last fixation to avoidconfounds from the target itself

Fourier analysis

Local power spectra of natural image patches arehighly informative about scene texture and structure(Oliva amp Torralba 2001) and can be diagnostic ofnatural scene categories (Torralba amp Oliva 2003) Inparticular the two-dimensional Fast Fourier Trans-form (FFT) allows us to visualize Fourier amplitude atdifferent spatial frequencies and orientations at thesame time To obtain a qualitative understanding ofwhether visual statistics at fixation were biased by thesought-after Gabor target we studied the similaritybetween the target and fixated image patches byexamining their spectral properties (see eg Hwang etal 2009) We computed local Fourier amplitude(absolute value of the FFT) in 18 middot 18 28 middot 28 and 48 middot48 windows at each fixation using the real image and a

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 6

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

shuffle-control image We averaged the obtained power

spectra across fixations for each trial and then

computed a two-tailed t statistic across trials of the

difference in log absolute FFT between fixated patches

and shuffled-controls

Receiver operating characteristic analysis

To study the extent to which each visual featurecould predict gaze we performed a receiver operatingcharacteristic (ROC) curve analysis comparing thedistributions of fixated patches and shuffled controls

Figure 4 Orientation statistics For each natural scene we computed the local orientation statistics at all fixated and shuffled-control

patches edgendashenergy defined as the sum of squares of local gradients orientedness defined as a normalized ratio of eigenvalues of

the local orientation covariance matrix which measures the extent to which a single orientation is predominant and the predominant

orientation defined as the eigendirection Verticalness is defined as the absolute sine of the predominant orientation

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 7

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

over each visual feature We computed the area underthe ROC curve (AUC) for each feature and boot-strapped across fixations to obtain 95 confidenceintervals (CIs) for the AUCs

Analysis of visual feature influence within a trial

Although analysis of visual features at fixationsacross trials and sessions provide a general idea ofwhich features influence saccade planning they im-plicitly assume that the strategy of planning saccadesremains unchanged over time Since our goal is toobtain an insight into how search strategy evolves overtime we studied the difference between visual featuresat early and late fixations To understand how searchstrategy evolves within a single trial we computed theSpearmanrsquos rank correlation coefficient between dif-ferent visual features and two eye-movement variablesthe numbered order of fixation within a trial (fixationorder) and the saccade length made to reach thatfixation location along with their correspondingbootstrapped 95 confidence intervals

Analysis of visual feature analysis over sessions usingmultivariate decoding

Although ROC analysis provides a useful charac-terization of the influence of individual features it is ametric that compares two one-dimensional distribu-tions The AUC is a good measure of the importance ofa given variable but comparing two AUCs obtainedwith two different variables is not meaningful as ametric of relative importance if the variables arecorrelated However most features that inform bot-tom-up saliency are correlated with each other and withlocal contrast (Kienzle Franz Scholkopf amp Wich-mann 2009) Further in our task the Gabor-similaritymetric (relevance) may be correlated with orientationor edgendashenergy Therefore a large AUC for relevancecould merely result from the effect of a large AUC foredgendashenergy or vice versa

Multivariate analysis is a suitable tool for potentiallyexplaining away the effects of correlated variables Forinstance over the past few years generalized linearmodels (GLMs Nelder amp Wedderburn 1972) havebeen used to model spike trains when neural activitymay depend on multiple potentially correlated vari-ables (Fernandes et al 2013) Here we used aparticular GLM multivariate logistic regression todiscriminate fixated patches from shuffled controlsbased on the local visual features defined abovesaliency relevance edgendashenergy orientedness andverticalness defined as the sine of the predominantorientation

Using this decoding technique we estimated effectsizes for each feature using partial models leaving each

feature out of the full model comprising all featuresand comparing these partial models with the full model(see Appendix A for details on logistic regressiongoodness of fit characterization and effect sizeestimation) For this analysis we pooled fixationsacross four consecutive sessions for each search taskand each animal

Results

To understand the relative contributions of visualfeatures that guide saccades during naturalistic visionwe had two monkeys search for embedded Gaborwavelets in human-photographed natural scenesMonkeys performed the task over two sets of fourconsecutive days with the sets differing on theorientation of the target vertical in one set andhorizontal in the other We modeled the effects ofdecreasing retinal acuity as a function of eccentricityand analyzed visual features at fixated locations innatural scenes to ask how the features of the targetstimulus (task-relevant) as well as other features of thevisual scene (bottom-up) affect the search behavior Wethen studied how the influence of these visual featuresevolved across multiple time scales

Search performance

We first wanted to see if the monkeys couldsuccessfully perform the task that is if they couldsuccessfully find the target in a considerable number ofthe 700ndash1500 trials they completed on each of the eightdays We found that while one monkey (MAS15)performed better than the other (MAS16) as measuredby the fraction of successful trials (65 vs 41) and bythe mean number of fixations required to find thetarget (71 vs 83) both monkeys were able to find thetarget in each session (Supplementary Figure S2)Although there are differences in performance consis-tent with previously observed behavior of bothmonkeys (Kuo Kording amp Segraves 2012) they wereable to perform the task

Fourier analysis reveals the effect of edgendashenergy and target orientation

Gabor wavelets have a well-defined spectral signa-ture (Figure 5 first row) Thus we may reasonablyexpect saccade target selection to be guided by thelocal frequency content in natural scenes To get aninsight into this possibility we computed 2-D Fourierpower spectra at fixated and shuffled-control patches

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 8

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 5: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

patches at fixated locations after peripherally blurringthem using a retinal transform The patch size specifiesthe size of the spotlight in the periphery over which visualfeatures are pooled prior to saccade decisions We chosepatch sizes of 18middot 18 28middot 28 and 48middot 48 and performedall analyses separately for each patch size

We computed one bottom-up feature (saliency) onetask-specific feature (relevance) and three low-levelorientation statistics (edgendashenergy orientedness andpredominant orientation)

Bottom-up saliency

Bottom-up saliency is the extent to which an imagepatch stands out with respect to its immediate neigh-borhood Typically saliency at a particular location is

operationally defined as the difference between a featurersquos(or set of features) value at that location and the featurersquos(or set of features) value in the immediately surroundingneighborhood This center-surround difference map istypically computed formultiple features atmultiple scalesand then summed together to obtain a saliency map (Ittiamp Koch 2001) A number of bottom-up saliency mapshave been proposed (see Borji Sihite amp Itti 2013 for asurvey) We used the most popular model (Itti et al1998) implemented in the Matlab Saliency Toolbox(wwwsaliencytoolboxnet Walther amp Koch 2006) witheight orientations and four scales (see Figure 3A)

Top-down relevance or target-similarity

The relevance of a fixated image patch must measurethe extent to which the information in the patch is

Figure 3 Computation of priority maps based on bottom-up saliency and top-down relevance The yellow crosshair represents the

fixation with respect to which the peripheral blurring operator is applied (A) Saliency Itti-Koch saliency (Itti et al 1998) computes

luminance contrast and orientation contrast at four different scales and eight different orientations and adds them across scales to

produce a saliency map (B) Relevance A Gabor-detector computes template Gabor filter outputs for a quadrature pair (in-phase

Gabor and its 908 phase shift) and takes the sum of squares to produce a relevance map Vertical and horizontal relevance maps are

computed for modeling the vertical and horizontal tasks respectively Only the vertical relevance map is shown here

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 5

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

relevant for the search task We assume a greedysearcher who looks at patches that maximally resem-bled the target

Before computing a similarity measure to the targetwe locally standardized the grayscale values of thepatch and then linearly stretched them to lie between 0and 1

Zethx yTHORN frac14 Iethx yTHORN lr

eth2THORN

Uethx yTHORN frac14 Zethx yTHORN minethZTHORNmaxethZTHORN minethZTHORN eth3THORN

We then defined target relevance of a fixated imagepatch as the sum of squares of the convolution of thefixated image patch with the target Gabor wavelet andits quadrature phase pair

R frac14 ethUGTHORN2 thorn ethUG0THORN2 eth4THORNwhere G and G0 are the Gabor target and its 908 phase-shifted version and represents two-dimensional con-volution The scalar relevance value that is used forsubsequent analysis is then the central pixel value of theconvolved image which happens to be equivalent to thesum of squares of dot products between the patch and thequadrature Gabor pairs for the 28 middot 28 image patch

For illustration the relevance maps visualized inFigures 2 and 3B are obtained by taking the sum ofsquares of the convolution between the target Gaborquadrature pairs and the full-sized contrast-normalizedimage

Orientation statistics

Aside from the composite measures of saliency andrelevance we extracted local orientation statistics usingan eigenvalue decomposition of the energy tensorcomputed as the covariance matrix of horizontal andvertical edge gradients at each pixel (Figure 4 Ganguliet al 2010)

Ixethx yTHORN frac14]Iethx yTHORN

]xeth5THORN

Iyethx yTHORN frac14]Iethx yTHORN

]yeth6THORN

Cethx yTHORN frac14 covethIx IxTHORNcovethIx IyTHORN

covethIx IyTHORNcovethIy IyTHORN

eth7THORN

where Ix and Iy are horizontal and vertical gradientsand C(x y) is the energy tensor Horizontal andvertical image gradients were computed as differencesbetween consecutive columns and rows of the imagerespectively From the eigenvalues of C(x y) k1 and

k2 we then computed the following orientationstatistics

The edgendashenergy of the patch is given by Efrac14k1thornk2the eigendirection h frac14 tan1[cov(Ix Ix) k2][cov(IxIy)] gives the predominant orientation and the ratio ofvariance along the eigendirection and its orthogonaldirection ffrac14 (k1 k2)(k1 thorn k2) gives the orientednessof the patch (0 for no orientation 1 for bars and edgesin any direction) We then computed measure forverticalness as tfrac14 jsin hj Verticalness is 1 for aperfectly vertical patch and 0 for a perfectly horizontalpatch

Analysis of fixation statistics

Our goal was to quantify the relative importanceof each visual statistic in predicting fixation behaviorand to study how these influences evolved overdifferent time scales The typical method to measurethe predictive power of a visual statistic is tocompare its value at fixated versus nonfixated pointsin the stimulus image However the stimuli arehuman-photographed scenes with objects of interestnear the center therefore fixated locations tend tobe closer to the center of the image rather thanrandomly sampled nonfixated locations This isknown as the center bias and can potentially inflateeffect size for a given statistic (eg Kanan TongZhang amp Cottrell 2009 Tseng Carmi CameronMunoz amp Itti 2009) To avoid center bias shuffled-control statistics are taken from the same location asthe current fixation but from a different image Inall subsequent analyses of visual statistics at fixationwe compared real visual features at fixation withshuffled controls computed as described above andalways excluded the very last fixation to avoidconfounds from the target itself

Fourier analysis

Local power spectra of natural image patches arehighly informative about scene texture and structure(Oliva amp Torralba 2001) and can be diagnostic ofnatural scene categories (Torralba amp Oliva 2003) Inparticular the two-dimensional Fast Fourier Trans-form (FFT) allows us to visualize Fourier amplitude atdifferent spatial frequencies and orientations at thesame time To obtain a qualitative understanding ofwhether visual statistics at fixation were biased by thesought-after Gabor target we studied the similaritybetween the target and fixated image patches byexamining their spectral properties (see eg Hwang etal 2009) We computed local Fourier amplitude(absolute value of the FFT) in 18 middot 18 28 middot 28 and 48 middot48 windows at each fixation using the real image and a

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 6

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

shuffle-control image We averaged the obtained power

spectra across fixations for each trial and then

computed a two-tailed t statistic across trials of the

difference in log absolute FFT between fixated patches

and shuffled-controls

Receiver operating characteristic analysis

To study the extent to which each visual featurecould predict gaze we performed a receiver operatingcharacteristic (ROC) curve analysis comparing thedistributions of fixated patches and shuffled controls

Figure 4 Orientation statistics For each natural scene we computed the local orientation statistics at all fixated and shuffled-control

patches edgendashenergy defined as the sum of squares of local gradients orientedness defined as a normalized ratio of eigenvalues of

the local orientation covariance matrix which measures the extent to which a single orientation is predominant and the predominant

orientation defined as the eigendirection Verticalness is defined as the absolute sine of the predominant orientation

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 7

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

over each visual feature We computed the area underthe ROC curve (AUC) for each feature and boot-strapped across fixations to obtain 95 confidenceintervals (CIs) for the AUCs

Analysis of visual feature influence within a trial

Although analysis of visual features at fixationsacross trials and sessions provide a general idea ofwhich features influence saccade planning they im-plicitly assume that the strategy of planning saccadesremains unchanged over time Since our goal is toobtain an insight into how search strategy evolves overtime we studied the difference between visual featuresat early and late fixations To understand how searchstrategy evolves within a single trial we computed theSpearmanrsquos rank correlation coefficient between dif-ferent visual features and two eye-movement variablesthe numbered order of fixation within a trial (fixationorder) and the saccade length made to reach thatfixation location along with their correspondingbootstrapped 95 confidence intervals

Analysis of visual feature analysis over sessions usingmultivariate decoding

Although ROC analysis provides a useful charac-terization of the influence of individual features it is ametric that compares two one-dimensional distribu-tions The AUC is a good measure of the importance ofa given variable but comparing two AUCs obtainedwith two different variables is not meaningful as ametric of relative importance if the variables arecorrelated However most features that inform bot-tom-up saliency are correlated with each other and withlocal contrast (Kienzle Franz Scholkopf amp Wich-mann 2009) Further in our task the Gabor-similaritymetric (relevance) may be correlated with orientationor edgendashenergy Therefore a large AUC for relevancecould merely result from the effect of a large AUC foredgendashenergy or vice versa

Multivariate analysis is a suitable tool for potentiallyexplaining away the effects of correlated variables Forinstance over the past few years generalized linearmodels (GLMs Nelder amp Wedderburn 1972) havebeen used to model spike trains when neural activitymay depend on multiple potentially correlated vari-ables (Fernandes et al 2013) Here we used aparticular GLM multivariate logistic regression todiscriminate fixated patches from shuffled controlsbased on the local visual features defined abovesaliency relevance edgendashenergy orientedness andverticalness defined as the sine of the predominantorientation

Using this decoding technique we estimated effectsizes for each feature using partial models leaving each

feature out of the full model comprising all featuresand comparing these partial models with the full model(see Appendix A for details on logistic regressiongoodness of fit characterization and effect sizeestimation) For this analysis we pooled fixationsacross four consecutive sessions for each search taskand each animal

Results

To understand the relative contributions of visualfeatures that guide saccades during naturalistic visionwe had two monkeys search for embedded Gaborwavelets in human-photographed natural scenesMonkeys performed the task over two sets of fourconsecutive days with the sets differing on theorientation of the target vertical in one set andhorizontal in the other We modeled the effects ofdecreasing retinal acuity as a function of eccentricityand analyzed visual features at fixated locations innatural scenes to ask how the features of the targetstimulus (task-relevant) as well as other features of thevisual scene (bottom-up) affect the search behavior Wethen studied how the influence of these visual featuresevolved across multiple time scales

Search performance

We first wanted to see if the monkeys couldsuccessfully perform the task that is if they couldsuccessfully find the target in a considerable number ofthe 700ndash1500 trials they completed on each of the eightdays We found that while one monkey (MAS15)performed better than the other (MAS16) as measuredby the fraction of successful trials (65 vs 41) and bythe mean number of fixations required to find thetarget (71 vs 83) both monkeys were able to find thetarget in each session (Supplementary Figure S2)Although there are differences in performance consis-tent with previously observed behavior of bothmonkeys (Kuo Kording amp Segraves 2012) they wereable to perform the task

Fourier analysis reveals the effect of edgendashenergy and target orientation

Gabor wavelets have a well-defined spectral signa-ture (Figure 5 first row) Thus we may reasonablyexpect saccade target selection to be guided by thelocal frequency content in natural scenes To get aninsight into this possibility we computed 2-D Fourierpower spectra at fixated and shuffled-control patches

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 8

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 6: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

relevant for the search task We assume a greedysearcher who looks at patches that maximally resem-bled the target

Before computing a similarity measure to the targetwe locally standardized the grayscale values of thepatch and then linearly stretched them to lie between 0and 1

Zethx yTHORN frac14 Iethx yTHORN lr

eth2THORN

Uethx yTHORN frac14 Zethx yTHORN minethZTHORNmaxethZTHORN minethZTHORN eth3THORN

We then defined target relevance of a fixated imagepatch as the sum of squares of the convolution of thefixated image patch with the target Gabor wavelet andits quadrature phase pair

R frac14 ethUGTHORN2 thorn ethUG0THORN2 eth4THORNwhere G and G0 are the Gabor target and its 908 phase-shifted version and represents two-dimensional con-volution The scalar relevance value that is used forsubsequent analysis is then the central pixel value of theconvolved image which happens to be equivalent to thesum of squares of dot products between the patch and thequadrature Gabor pairs for the 28 middot 28 image patch

For illustration the relevance maps visualized inFigures 2 and 3B are obtained by taking the sum ofsquares of the convolution between the target Gaborquadrature pairs and the full-sized contrast-normalizedimage

Orientation statistics

Aside from the composite measures of saliency andrelevance we extracted local orientation statistics usingan eigenvalue decomposition of the energy tensorcomputed as the covariance matrix of horizontal andvertical edge gradients at each pixel (Figure 4 Ganguliet al 2010)

Ixethx yTHORN frac14]Iethx yTHORN

]xeth5THORN

Iyethx yTHORN frac14]Iethx yTHORN

]yeth6THORN

Cethx yTHORN frac14 covethIx IxTHORNcovethIx IyTHORN

covethIx IyTHORNcovethIy IyTHORN

eth7THORN

where Ix and Iy are horizontal and vertical gradientsand C(x y) is the energy tensor Horizontal andvertical image gradients were computed as differencesbetween consecutive columns and rows of the imagerespectively From the eigenvalues of C(x y) k1 and

k2 we then computed the following orientationstatistics

The edgendashenergy of the patch is given by Efrac14k1thornk2the eigendirection h frac14 tan1[cov(Ix Ix) k2][cov(IxIy)] gives the predominant orientation and the ratio ofvariance along the eigendirection and its orthogonaldirection ffrac14 (k1 k2)(k1 thorn k2) gives the orientednessof the patch (0 for no orientation 1 for bars and edgesin any direction) We then computed measure forverticalness as tfrac14 jsin hj Verticalness is 1 for aperfectly vertical patch and 0 for a perfectly horizontalpatch

Analysis of fixation statistics

Our goal was to quantify the relative importanceof each visual statistic in predicting fixation behaviorand to study how these influences evolved overdifferent time scales The typical method to measurethe predictive power of a visual statistic is tocompare its value at fixated versus nonfixated pointsin the stimulus image However the stimuli arehuman-photographed scenes with objects of interestnear the center therefore fixated locations tend tobe closer to the center of the image rather thanrandomly sampled nonfixated locations This isknown as the center bias and can potentially inflateeffect size for a given statistic (eg Kanan TongZhang amp Cottrell 2009 Tseng Carmi CameronMunoz amp Itti 2009) To avoid center bias shuffled-control statistics are taken from the same location asthe current fixation but from a different image Inall subsequent analyses of visual statistics at fixationwe compared real visual features at fixation withshuffled controls computed as described above andalways excluded the very last fixation to avoidconfounds from the target itself

Fourier analysis

Local power spectra of natural image patches arehighly informative about scene texture and structure(Oliva amp Torralba 2001) and can be diagnostic ofnatural scene categories (Torralba amp Oliva 2003) Inparticular the two-dimensional Fast Fourier Trans-form (FFT) allows us to visualize Fourier amplitude atdifferent spatial frequencies and orientations at thesame time To obtain a qualitative understanding ofwhether visual statistics at fixation were biased by thesought-after Gabor target we studied the similaritybetween the target and fixated image patches byexamining their spectral properties (see eg Hwang etal 2009) We computed local Fourier amplitude(absolute value of the FFT) in 18 middot 18 28 middot 28 and 48 middot48 windows at each fixation using the real image and a

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 6

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

shuffle-control image We averaged the obtained power

spectra across fixations for each trial and then

computed a two-tailed t statistic across trials of the

difference in log absolute FFT between fixated patches

and shuffled-controls

Receiver operating characteristic analysis

To study the extent to which each visual featurecould predict gaze we performed a receiver operatingcharacteristic (ROC) curve analysis comparing thedistributions of fixated patches and shuffled controls

Figure 4 Orientation statistics For each natural scene we computed the local orientation statistics at all fixated and shuffled-control

patches edgendashenergy defined as the sum of squares of local gradients orientedness defined as a normalized ratio of eigenvalues of

the local orientation covariance matrix which measures the extent to which a single orientation is predominant and the predominant

orientation defined as the eigendirection Verticalness is defined as the absolute sine of the predominant orientation

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 7

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

over each visual feature We computed the area underthe ROC curve (AUC) for each feature and boot-strapped across fixations to obtain 95 confidenceintervals (CIs) for the AUCs

Analysis of visual feature influence within a trial

Although analysis of visual features at fixationsacross trials and sessions provide a general idea ofwhich features influence saccade planning they im-plicitly assume that the strategy of planning saccadesremains unchanged over time Since our goal is toobtain an insight into how search strategy evolves overtime we studied the difference between visual featuresat early and late fixations To understand how searchstrategy evolves within a single trial we computed theSpearmanrsquos rank correlation coefficient between dif-ferent visual features and two eye-movement variablesthe numbered order of fixation within a trial (fixationorder) and the saccade length made to reach thatfixation location along with their correspondingbootstrapped 95 confidence intervals

Analysis of visual feature analysis over sessions usingmultivariate decoding

Although ROC analysis provides a useful charac-terization of the influence of individual features it is ametric that compares two one-dimensional distribu-tions The AUC is a good measure of the importance ofa given variable but comparing two AUCs obtainedwith two different variables is not meaningful as ametric of relative importance if the variables arecorrelated However most features that inform bot-tom-up saliency are correlated with each other and withlocal contrast (Kienzle Franz Scholkopf amp Wich-mann 2009) Further in our task the Gabor-similaritymetric (relevance) may be correlated with orientationor edgendashenergy Therefore a large AUC for relevancecould merely result from the effect of a large AUC foredgendashenergy or vice versa

Multivariate analysis is a suitable tool for potentiallyexplaining away the effects of correlated variables Forinstance over the past few years generalized linearmodels (GLMs Nelder amp Wedderburn 1972) havebeen used to model spike trains when neural activitymay depend on multiple potentially correlated vari-ables (Fernandes et al 2013) Here we used aparticular GLM multivariate logistic regression todiscriminate fixated patches from shuffled controlsbased on the local visual features defined abovesaliency relevance edgendashenergy orientedness andverticalness defined as the sine of the predominantorientation

Using this decoding technique we estimated effectsizes for each feature using partial models leaving each

feature out of the full model comprising all featuresand comparing these partial models with the full model(see Appendix A for details on logistic regressiongoodness of fit characterization and effect sizeestimation) For this analysis we pooled fixationsacross four consecutive sessions for each search taskand each animal

Results

To understand the relative contributions of visualfeatures that guide saccades during naturalistic visionwe had two monkeys search for embedded Gaborwavelets in human-photographed natural scenesMonkeys performed the task over two sets of fourconsecutive days with the sets differing on theorientation of the target vertical in one set andhorizontal in the other We modeled the effects ofdecreasing retinal acuity as a function of eccentricityand analyzed visual features at fixated locations innatural scenes to ask how the features of the targetstimulus (task-relevant) as well as other features of thevisual scene (bottom-up) affect the search behavior Wethen studied how the influence of these visual featuresevolved across multiple time scales

Search performance

We first wanted to see if the monkeys couldsuccessfully perform the task that is if they couldsuccessfully find the target in a considerable number ofthe 700ndash1500 trials they completed on each of the eightdays We found that while one monkey (MAS15)performed better than the other (MAS16) as measuredby the fraction of successful trials (65 vs 41) and bythe mean number of fixations required to find thetarget (71 vs 83) both monkeys were able to find thetarget in each session (Supplementary Figure S2)Although there are differences in performance consis-tent with previously observed behavior of bothmonkeys (Kuo Kording amp Segraves 2012) they wereable to perform the task

Fourier analysis reveals the effect of edgendashenergy and target orientation

Gabor wavelets have a well-defined spectral signa-ture (Figure 5 first row) Thus we may reasonablyexpect saccade target selection to be guided by thelocal frequency content in natural scenes To get aninsight into this possibility we computed 2-D Fourierpower spectra at fixated and shuffled-control patches

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 8

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 7: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

shuffle-control image We averaged the obtained power

spectra across fixations for each trial and then

computed a two-tailed t statistic across trials of the

difference in log absolute FFT between fixated patches

and shuffled-controls

Receiver operating characteristic analysis

To study the extent to which each visual featurecould predict gaze we performed a receiver operatingcharacteristic (ROC) curve analysis comparing thedistributions of fixated patches and shuffled controls

Figure 4 Orientation statistics For each natural scene we computed the local orientation statistics at all fixated and shuffled-control

patches edgendashenergy defined as the sum of squares of local gradients orientedness defined as a normalized ratio of eigenvalues of

the local orientation covariance matrix which measures the extent to which a single orientation is predominant and the predominant

orientation defined as the eigendirection Verticalness is defined as the absolute sine of the predominant orientation

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 7

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

over each visual feature We computed the area underthe ROC curve (AUC) for each feature and boot-strapped across fixations to obtain 95 confidenceintervals (CIs) for the AUCs

Analysis of visual feature influence within a trial

Although analysis of visual features at fixationsacross trials and sessions provide a general idea ofwhich features influence saccade planning they im-plicitly assume that the strategy of planning saccadesremains unchanged over time Since our goal is toobtain an insight into how search strategy evolves overtime we studied the difference between visual featuresat early and late fixations To understand how searchstrategy evolves within a single trial we computed theSpearmanrsquos rank correlation coefficient between dif-ferent visual features and two eye-movement variablesthe numbered order of fixation within a trial (fixationorder) and the saccade length made to reach thatfixation location along with their correspondingbootstrapped 95 confidence intervals

Analysis of visual feature analysis over sessions usingmultivariate decoding

Although ROC analysis provides a useful charac-terization of the influence of individual features it is ametric that compares two one-dimensional distribu-tions The AUC is a good measure of the importance ofa given variable but comparing two AUCs obtainedwith two different variables is not meaningful as ametric of relative importance if the variables arecorrelated However most features that inform bot-tom-up saliency are correlated with each other and withlocal contrast (Kienzle Franz Scholkopf amp Wich-mann 2009) Further in our task the Gabor-similaritymetric (relevance) may be correlated with orientationor edgendashenergy Therefore a large AUC for relevancecould merely result from the effect of a large AUC foredgendashenergy or vice versa

Multivariate analysis is a suitable tool for potentiallyexplaining away the effects of correlated variables Forinstance over the past few years generalized linearmodels (GLMs Nelder amp Wedderburn 1972) havebeen used to model spike trains when neural activitymay depend on multiple potentially correlated vari-ables (Fernandes et al 2013) Here we used aparticular GLM multivariate logistic regression todiscriminate fixated patches from shuffled controlsbased on the local visual features defined abovesaliency relevance edgendashenergy orientedness andverticalness defined as the sine of the predominantorientation

Using this decoding technique we estimated effectsizes for each feature using partial models leaving each

feature out of the full model comprising all featuresand comparing these partial models with the full model(see Appendix A for details on logistic regressiongoodness of fit characterization and effect sizeestimation) For this analysis we pooled fixationsacross four consecutive sessions for each search taskand each animal

Results

To understand the relative contributions of visualfeatures that guide saccades during naturalistic visionwe had two monkeys search for embedded Gaborwavelets in human-photographed natural scenesMonkeys performed the task over two sets of fourconsecutive days with the sets differing on theorientation of the target vertical in one set andhorizontal in the other We modeled the effects ofdecreasing retinal acuity as a function of eccentricityand analyzed visual features at fixated locations innatural scenes to ask how the features of the targetstimulus (task-relevant) as well as other features of thevisual scene (bottom-up) affect the search behavior Wethen studied how the influence of these visual featuresevolved across multiple time scales

Search performance

We first wanted to see if the monkeys couldsuccessfully perform the task that is if they couldsuccessfully find the target in a considerable number ofthe 700ndash1500 trials they completed on each of the eightdays We found that while one monkey (MAS15)performed better than the other (MAS16) as measuredby the fraction of successful trials (65 vs 41) and bythe mean number of fixations required to find thetarget (71 vs 83) both monkeys were able to find thetarget in each session (Supplementary Figure S2)Although there are differences in performance consis-tent with previously observed behavior of bothmonkeys (Kuo Kording amp Segraves 2012) they wereable to perform the task

Fourier analysis reveals the effect of edgendashenergy and target orientation

Gabor wavelets have a well-defined spectral signa-ture (Figure 5 first row) Thus we may reasonablyexpect saccade target selection to be guided by thelocal frequency content in natural scenes To get aninsight into this possibility we computed 2-D Fourierpower spectra at fixated and shuffled-control patches

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 8

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 8: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

over each visual feature We computed the area underthe ROC curve (AUC) for each feature and boot-strapped across fixations to obtain 95 confidenceintervals (CIs) for the AUCs

Analysis of visual feature influence within a trial

Although analysis of visual features at fixationsacross trials and sessions provide a general idea ofwhich features influence saccade planning they im-plicitly assume that the strategy of planning saccadesremains unchanged over time Since our goal is toobtain an insight into how search strategy evolves overtime we studied the difference between visual featuresat early and late fixations To understand how searchstrategy evolves within a single trial we computed theSpearmanrsquos rank correlation coefficient between dif-ferent visual features and two eye-movement variablesthe numbered order of fixation within a trial (fixationorder) and the saccade length made to reach thatfixation location along with their correspondingbootstrapped 95 confidence intervals

Analysis of visual feature analysis over sessions usingmultivariate decoding

Although ROC analysis provides a useful charac-terization of the influence of individual features it is ametric that compares two one-dimensional distribu-tions The AUC is a good measure of the importance ofa given variable but comparing two AUCs obtainedwith two different variables is not meaningful as ametric of relative importance if the variables arecorrelated However most features that inform bot-tom-up saliency are correlated with each other and withlocal contrast (Kienzle Franz Scholkopf amp Wich-mann 2009) Further in our task the Gabor-similaritymetric (relevance) may be correlated with orientationor edgendashenergy Therefore a large AUC for relevancecould merely result from the effect of a large AUC foredgendashenergy or vice versa

Multivariate analysis is a suitable tool for potentiallyexplaining away the effects of correlated variables Forinstance over the past few years generalized linearmodels (GLMs Nelder amp Wedderburn 1972) havebeen used to model spike trains when neural activitymay depend on multiple potentially correlated vari-ables (Fernandes et al 2013) Here we used aparticular GLM multivariate logistic regression todiscriminate fixated patches from shuffled controlsbased on the local visual features defined abovesaliency relevance edgendashenergy orientedness andverticalness defined as the sine of the predominantorientation

Using this decoding technique we estimated effectsizes for each feature using partial models leaving each

feature out of the full model comprising all featuresand comparing these partial models with the full model(see Appendix A for details on logistic regressiongoodness of fit characterization and effect sizeestimation) For this analysis we pooled fixationsacross four consecutive sessions for each search taskand each animal

Results

To understand the relative contributions of visualfeatures that guide saccades during naturalistic visionwe had two monkeys search for embedded Gaborwavelets in human-photographed natural scenesMonkeys performed the task over two sets of fourconsecutive days with the sets differing on theorientation of the target vertical in one set andhorizontal in the other We modeled the effects ofdecreasing retinal acuity as a function of eccentricityand analyzed visual features at fixated locations innatural scenes to ask how the features of the targetstimulus (task-relevant) as well as other features of thevisual scene (bottom-up) affect the search behavior Wethen studied how the influence of these visual featuresevolved across multiple time scales

Search performance

We first wanted to see if the monkeys couldsuccessfully perform the task that is if they couldsuccessfully find the target in a considerable number ofthe 700ndash1500 trials they completed on each of the eightdays We found that while one monkey (MAS15)performed better than the other (MAS16) as measuredby the fraction of successful trials (65 vs 41) and bythe mean number of fixations required to find thetarget (71 vs 83) both monkeys were able to find thetarget in each session (Supplementary Figure S2)Although there are differences in performance consis-tent with previously observed behavior of bothmonkeys (Kuo Kording amp Segraves 2012) they wereable to perform the task

Fourier analysis reveals the effect of edgendashenergy and target orientation

Gabor wavelets have a well-defined spectral signa-ture (Figure 5 first row) Thus we may reasonablyexpect saccade target selection to be guided by thelocal frequency content in natural scenes To get aninsight into this possibility we computed 2-D Fourierpower spectra at fixated and shuffled-control patches

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 8

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 9: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

at three different patch sizes (18 middot 18 28 middot 28 and 48 middot48) We observed that the fixated patches have higheramplitude than nonfixated patches (positive t scoresFigure 5) suggesting that monkeys are biased to lookat locations of high amplitude As expected highedgendashenergy patches are attractive targets for fixa-tions

Crucially this same Fourier analysis allows us toinquire if the peaks in energy are consistent with thetarget orientation Indeed we also observed that thefixated patches have higher energy along the targetorientation In particular we found higher energyalong the vertical direction for the vertical task andalong the horizontal direction for the horizontal task(Figure 5) and that this effect appears was morepronounced for 48 middot 48 patches than 18 middot 18 patchesThe greater concentration of target power for the 48 middot48 case might be consistent with motor noise inexecuting saccades where the saccade lands slightlyaway from the intended location Thus Fourieranalysis provides a clear and intuitive visualizationthat monkeys fixate spots where the target orientationis dominant and this effect is better detected when

visual features are averaged over larger areas in theperiphery

ROC Analysis

Fourier analysis suggested that energy and task-specific target orientation were diagnostic of fixatedpatches during natural scene search Motivated bythis observation we intended to quantify the abilityof visual features to predict fixations We extractedlocal visual features saliency edgendashenergy relevanceorientedness and verticalness (see Methods) fromfixated patches and shuffled controls and thencompared them using ROC analysis We found thaton average across animals and sessions for each taskfixated patches were more salient had higher edge-energy were more relevant and had a higherverticalness than shuffled-control patches regardlessof the task however orientedness was not diagnosticof the difference between fixated and nonfixatedpatches1 (Figure 6) According to the analysis of onefeature at a time saliency edgendashenergy relevance andverticalness are all predictive of fixations

Figure 5 Spectral analysis of fixation locations Monkey fixation locations exhibit target-dependent orientation bias A vertical task B

horizontal task Top row Target Gabor for the search task (left) and its power spectrum (right) Bottom three rows the mean t statistic

across four consecutive days of the difference between log Fourier power of fixated patches and shuffled controls for monkeys

MAS15 (left) and MAS 16 (right) for three different patch sizes from top to bottom 18 middot 18 28 middot 28 and 48 middot 48 Fourier power is

clearly enhanced along the direction of target orientation Units on the color bars indicate the mean t score

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 9

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 10: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

Evolution of search strategy within a trial

Our goal was to study the temporal evolution ofsearch strategy at multiple time scales To understandhow saccade planning evolved at the relatively shorttime scale of within a single trial we correlated thevisual features at fixation with the fixation order (firstsecond third etc) and the length of the saccade madeto land at that fixation We observed that fixatedlocations had different features at early and latefixations In particular saliency edge-energy rele-vance orientedness and verticalness of fixated patchesincreased from the beginning to the end of the trial(Figure 7 and Supplementary Figure S3 filled blackdiamonds p 00001 for both tasks) We also foundthat saliency edge-energy relevance and verticalness offixated patches were negatively correlated p 00001for both animals and tasks with the length of thesaccade made to get to that location (Figure 7 andSupplementary Figure S2 filled blue squares) Bycontrast orientedness was positively correlated withfixation number and saccade length (Figure 7 andSupplementary Figure S2) These effects are abolishedas expected when the same analysis is performed withshuffled controls instead of fixated patches (unfilledblack diamonds and blue squares in Figure 7 andSupplementary Figure S2) Crucially these effects areonly consistent across animals and tasks when a retinaltransform is applied (last data point in all panels ofFigure 7 and Supplementary Figure S2) We furtherverified that these phenomena are independent that isalthough patches of high relevance or edge-energy arefixated early on in the trial by short saccades this is notmerely because early saccades are short on average Inparticular we found significant but very weak corre-lations between fixation order and saccade length foreither monkey (across tasks r frac14003 for the verticaltask and r frac14 002 for the horizontal task p 001 for

both correlations) which were not sufficient to explainthe high correlations observed with visual features atfixation Together these results show how searchstrategy evolves during the trial with predictive visualfeatures being fixated later in the trial

Multivariate analysis of the evolution of searchstrategy over days

Although ROC analysis suggests that saliencyrelevance and edgendashenergy can predict fixations onefundamental limitation of the technique is that it canonly quantify the effect of one variable at a timeHowever in practice these variables tend to becorrelated In our data for instance relevance is clearlycorrelated with edgendashenergy (q frac14 025 p 00001 16sessions two animals 28 middot 28 patch with retinaltransform 57775 fixations) and edgendashenergy is corre-lated with saliency q frac14 002 p 00001 Thecorrelation is problematic because the true effect ofedgendashenergy might manifest as an apparent effect ofsaliency or relevance To address this concern weapplied multivariate logistic regression to decodefixated from nonfixated patches as a function of localvisual features (see Methods)

Visual features could discriminate fixated fromnonfixated patches

Using multivariate logistic regression we asked iffixated patches could be decoded from shuffled-controlpatches We found that fixated patches and shuffledcontrols could be decoded reliably above chance level(50) Decoding accuracies did not differ significantlyacross tasks (vertical vs horizontal target) Fixatedpatches were better decoded when peripheral visual

Figure 6 Area under receiver operating characteristic (ROC) curves (AUCs) comparing the distribution of visual features at fixated

patches and shuffled controls across approximately 55000 fixations (pooled across 16 sessions from two monkeys) for each task and

each patch size Black bars represent the vertical Gabor task and red curves represent the horizontal task Error bars show

bootstrapped 95 confidence intervals

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 10

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 11: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

Figure 7 How does search strategy evolve within a trial Spearmanrsquos rank correlation coefficients between fixation order (black

diamonds) or saccade length (blue squares) and visual features saliency edgendashenergy relevance orientedness and verticalness are

shown for one monkey (MAS15) for each task Filled shapes show effects for fixated patches and unfilled shapes for shuffled controls

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 11

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 12: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

patches were not degraded by the retinal transform(last column of Table 1)

Relative influence of visual features on saccade choice

Having established that the visual features at fixationcould predict eye movement behavior we were interestedin the relative influence of individual features Bydecoding fixated patches from shuffled controls in thetwo tasks we found that edgendashenergy was predominantlymore influential than all other features in predictingfixations and saliency was completely explained away forboth monkeys (top two panels in Figure 8 Table 2)These results appear to agree with findings in humanstudies which suggest that bottom-up saliency plays aminimal role in explaining saccades during search tasks(eg Einhauser et al 2008) However it must be notedthat if edgendashenergy is a bottom-up feature then bottom-up features appear to play an important role in gazeprediction even during search

Given the demands of the search task (finding aGabor target in a natural scene) it is worth consideringthe possibility that edgendashenergy is not strictly a bottom-up feature but that locations with high edgendashenergy aresought after by the monkeys simply because Gaborwavelets have high edgendashenergy Although seekinghigh-energy patches is one possible search strategy ourFourier analysis (Figure 5) and ROC analysis (Figure6) suggest that target similarity (relevance) or targetorientation also plays a role Indeed when usingmultivariate decoding we found that task-relevantfeatures had nonzero effects (bottom three panels inFigure 8 Table 2) In particular relevance played animportant role for MAS15 but not MAS16 whereasthe reverse was true for orientedness (Figure 8 panels 3and 4 from above Table 2) On the basis of this resultit is not possible to conclude whether the internalsearch template guiding top-down saccade choice is

comprised of the target in its entirety (relevance)orientedness or the target orientation (verticalness)one dominant strategy does not win out across datafrom both animals and tasks but all seem to beimportant

Evolution of search strategy over multiple weeks

Although we did not find a change in search strategyover consecutive days monkey MAS15 had performedthe vertical search task ahead of the horizontal taskand the order of tasks was swapped for MAS16 Sincethese tasks were separated by a few weeks we had theopportunity to compare search strategy across tasksIndeed for the 28 middot 28 window we observed asignificant difference in the effect size of verticalness (afeature that captures the predominant orientation)across the two tasks In particular for MAS15 whohad performed the horizontal task later verticalness issignificantly predictive of saccades in the horizontaltask with shuffled-control patches being more verticalthat fixated patches (b 6 SEfrac1400546 6 00161 p frac1400009 see Table 2B and bottom panels in Figure 8)For MAS16 who had performed the vertical task laterverticalness is significantly predictive of saccades in thevertical task with fixated patches being more verticalthan shuffled-control patches (b 6 SE frac14 01232 600201 p 00001 see Table 2C and bottom panels inFigure 8) This practice effect of verticalness alsocorrelates with search performance as quantified bynumber of fixations required to find the target Inparticular a Wilcoxon rank-sum test for differentmedian number of fixations for earlier and later taskssuggests that both monkeys improve across tasks (p 000001 z frac14 291 median of 6 vs 4 fixations forMAS15 and p 000001 zfrac14 88 median of 8 vs 7fixations for MAS16) From these results it appearsthat animals gradually learn to place more emphasis on

Retinal transform

18 middot 18

Retinal transform

28 middot 28

Retinal transform

48 middot 48

No retinal transform

28 middot 28

MAS15 vertical task 557 6 09 555 6 06 568 6 05 615 6 07

MAS15 horizontal task 565 6 06 564 6 08 566 6 08 643 6 10

MAS16 vertical task 584 6 06 602 6 13 605 6 14 629 6 07

MAS16 horizontal task 592 6 08 609 6 06 623 6 13 623 6 10

Table 1 Decoding accuracies (mean percentage of correctly predicted patches and standard deviation across 10 cross-validation folds)for fixated patches versus shuffled controls using all bottom-up and task-relevant visual features as inputs to a logistic regression

Error bars show bootstrapped 95 confidence intervals The symbols on the x-axis indicate the image processing done prior to

feature computation the first three data points correspond to a retinal transform followed by averaging over 18 middot 18 28 middot 28 and 48

middot 48 windows respectively the fourth data point indicates averaging over a 28 middot 28 window without applying any retinal transform

The star indicates a significance level of p 00001

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 12

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 13: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

the target orientation over a period of several weeks of

performing the two related tasks Furthermore this

long time-scale practice effect is not consistently

detected by our analysis if we do not apply a retinal

transform (bottom panels of Figure 8) suggesting once

again that a biologically realistic model of peripheral

vision is essential to understand eye movement

behavior during search

Discussion

We set out to examine the contents of the visualpriority map employed in saccade guidance and howthey evolved over time when monkeys searched forGabor targets embedded in grayscale natural scenesWe applied multivariate analysis technique to tease outthe relative contributions of various task-relevant

Figure 8 Relative influence of visual features on gaze behavior during search The influence of each feature on eye gaze was obtained

by measuring the extent to which a full model comprising all features would improve upon a partial model excluding that feature

Effect sizes (relative pseudo-R2 on the test set) of features for the vertical (black) and horizontal (red) Gabor search tasks for each

monkey Error bars show standard errors of mean across 10 cross-validation folds

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 13

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 14: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

features while allowing for bottom-up features toexplain them away We found that over short timescales of a single trial monkeys preferred to fixatetarget locations with high saliency edgendashenergyrelevance and verticalness with short saccades andlocations with high orientedness with long saccades

As the trial progressed locations of higher bottom-upfeatures as well those of higher relevance for the searchtask were more likely to be fixated Over longer timescales of weeks animals were able to adapt their searchstrategy to seek our task-relevant features at saccadetargets and this improvement in strategy was associ-

Variable b SE t p Likelihood ratio p

Constant 00180 00108 16706 00954

Saliency 00085 00108 07809 04503 06797 09442

Edge-energy 02612 00131 198995 00001 4552963 0

Relevance 01154 00133 86978 00001 831493 00001

Orientedness 00268 00120 22281 00319 50343 03035

Verticalness 00096 00126 07664 04607 06666 09450

Table 2 Regression tables for logistic regression decoding of 28 middot 28 fixated patches and shuffled controls after the retinal transformwas applied Separate regression tables for each animal (MAS15 MAS16) and task (vertical or horizontal Gabor search) providecoefficient estimates standard errors t scores and log-likelihoods chi-squared (v2) statistics and pseudo-R2s (RD

2) as well aslikelihood ratios for all leave-one-feature-out partial models Significant features are emphasized in boldface

Part A MAS15 Vertical target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1423992 RD2frac14 00143 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00337 00130 25918 00097

Saliency 00287 00131 21859 00341 48705 03187

Edge-energy 02880 00177 162238 00001 3165875 0

Relevance 01889 00176 107034 00001 1315860 0

Orientedness 00359 00163 22012 00417 49799 03190

Verticalness 00546 00161 33997 00009 116278 00258

Part B MAS15 Horizontal target LR v2(6)frac14 7044 pfrac14 0 log-likelihoodfrac1416714 RD2frac14 00203 Standardized Pseudo-R2

(test set)

Variable b SE t p Likelihood ratio p

Constant 00861 00165 52211 00001

Saliency 00032 00164 01958 07475 01470 09953

Edge-energy 05386 00253 212555 00001 5846116 0

Relevance 00380 00197 19298 00621 39210 04398

Orientedness 01494 00198 75346 00001 571285 00001

Verticalness 01232 00201 61210 00001 375927 00001

Part C MAS16 Vertical target LR v2(6)frac14 7695 pfrac14 0 log-likelihoodfrac1410576 RD2frac14 00346 Standardized Pseudo-R2 (test

set)

Variable b SE t p Likelihood ratio p

Constant 00775 00144 53812 00001

Saliency 00069 00143 04846 06132 04056 09694

Edge-energy 06241 00236 264207 00001 9533296 0

Relevance 00025 00164 01493 07738 01008 09982

Orientedness 00707 00178 39771 00002 159725 00059

Verticalness 00224 00173 12956 02320 18565 07618

Part D MAS16 Horizontal target LR v2(6)frac14 10994 pfrac14 0 log-likelihoodfrac1414001 RD2frac14 00372 Standardized Pseudo-R2

(test set)

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 14

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 15: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

ated with statistically significantly fewer fixationsrequired to find the target in successful trials Cruciallyour findings only reveal themselves once realisticmodels of peripheral vision are taken into account

The multivariate decoding accuracies leave someanomalies to be explained In particular they suggestthat patches fixated by MAS16 the worse-performingmonkey are better discriminated than those fixated byMAS15 One potential explanation is that MAS16relies on edgendashenergy more than MAS15 (Figure 8)which as we have seen with the ROC analysis (Figure6) is the strongest predictor of fixations FurtherMAS16 show improved decoding with increased spatialextent of averaging This might be because of the 2-Dauto-correlation function of visual features aroundfixations that of edgendashenergy might be more spreadthan that of saliency or relevance or other features

One limitation of our approach is that although wehave used multivariate regression to explain awaycorrelated features our data do not establish a causalrelationship between certain visual features and gazeFor instance in our current design since Gabor targetshave high edgendashenergy and we have not parametricallyvaried edgendashenergy content across experimental ses-sions it is not possible to say whether edgendashenergy is abottom-up or a top-down factor By parametricallymanipulating individual features in the scene andcomparing gaze behavior across these conditions it ispossible to establish a more direct relationship betweenvisual features at fixation and factors that influencesaccade choice (see eg Einhauser et al 2008Engmann et al 2009 lsquot Hart Schmidt Roth ampEinhauser 2013) In future studies we intend to designexperiments of this nature

The natural world is seldom static and motion cuesare known to be predictive of gaze (Le Meur Le Calletamp Barba 2007) Therefore it might be argued thatsearching in static natural scenes may not be repre-sentative of natural eye movement behavior Howeverit was our intention to design tasks that are ascomparable as possible to studies in humans so thatour findings on monkeys would be directly comparable

For the stimuli in this study we have not taken intoaccount the effect of generic task-independent biasesthat might inform eye movement behavior Forinstance it has been shown that human subjects tend todemonstrate a world-centered bias by often lookingslightly below the horizon (Cristino amp Baddeley 2009)Future models would benefit from accounting for suchecological factors that bias gaze allocation

An implicit assumption of our analysis frameworkmdashstudying visual features at fixation to infer saccade-planning strategymdashis that saccades are always made toplanned locations that maximally resemble the targetthat is we assume that the oculomotor plan is a greedystrategy to maximize accuracy This assumption has a

couple of weaknesses First discrepancies betweendecision and action have been noted it is known thatsaccades are sometimes made to intermediate locationsbetween intended targets followed by subsequentcorrective saccades (Findlay 1982 Zelinsky 2008Zelinsky Rao Hayhoe amp Ballard 1997) Second ithas been proposed that saccades might be made tomaximize information (minimize Shannon entropy)rather than accuracy (Najemnik amp Geisler 2005)further both Bayes-optimal and heuristic searchmodels that maximize information seem to reproducestatistics of fixations made by human searchers betterthan a Bayesian model that only maximizes accuracy(Najemnik amp Geisler 2008 2009) These are importantphenomena that we have not addressed in our modelsof visual search

Analysis of visual statistics at fixation

Although a large majority of priority map models(see models reviewed in Borji et al 2013) have appliedheuristics based on the knowledge of the visual systema number of studies have analyzed visual informationat fixated locations with the goal of reverse-engineeringthe features that attract eye gaze These studies weredone in humans but generally agree with our nonhu-man primatersquos results We discuss a few pertinenthuman studies below in relation to our findings

We found that edgendashenergy predicts fixations wellGiven that principal components of natural imagesresemble derivatives of 2-D Gaussians (RajashekarCormack amp Bovik 2003) which are not very differentfrom edge detectors gazing at high-energy patchesmight be interpreted as maximizing the varianceefficiently

We found that relevance (resemblance to the target)could weakly but significantly predict fixations duringsearch These findings are in agreement with an earlierstudy showing that when humans searched for simplegeometric targets embedded in 1f2 noise the content offixated locations resembled targets (Rajashekar Bovikamp Cormack 2006) They also showed considerableintersubject variation in search strategy with twosubjects selecting saccade targets based on the search-target shape and one subject simply using a sizecriterion We observe different search strategies inmonkeys as well with saliency and relevance (targetsimilarity) predicting fixations to different relativeextents in the two monkeys

We observed clear difference in power spectrabetween fixated versus non-fixated patches and alsobetween these spectral differences across the two searchtasks However it is important to note that powerspectra predominantly capture second order statisticsand might therefore not capture the full extent of

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 15

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 16: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

potential variance in fixated patches Indeed KriegerRentschler Hauske Schill and Zetzsche (2000) showedthat locations with larger spatial variance are morelikely to be fixated but this variance largely emergesfrom higher-order statistics quantifiable using bispec-tral densities and is therefore invisible in the powerspectra Their results provide a cautionary note to theinterpretation of power-spectral differences that weobserved in our analysis

A number of studies in humans support our findingthat the influence of bottom-up features is diminishedwhile that of task-relevant features is enhanced duringsearch in natural scenes (Castelhano amp Heaven 2010Ehinger et al 2009 Einhauser et al 2008 Hendersonet al 2007 Malcolm amp Henderson 2009 SchutzTrommershauser amp Gegenfurtner 2012) Althoughnone of these studies have explicitly modeled thedegraded representation of visual information in theperiphery they are in agreement with our finding thatsaliency gets explained away by edgendashenergy

Studies that try to characterize the features humansuse while searching or free-viewing artificial or naturalstimuli are generally consistent with the results weobtain in our nonhuman primate study In the light ofthese studies on humans our study while by no meansmethodologically novel supports the existence of verysimilar eye-movement strategies in nonhuman pri-mates This is a necessary first step towards under-standing the neural circuits that compute thesedecisions using invasive electrophysiology An addi-tional comparative study repeating the same task inhumans might be a useful bridge between existingstudies of natural scene search in humans and futureresults from primate electrophysiology studies

The contents of the bottom-up priority map

Given that our experiment was a search task and thefact that the influence of bottom-up features isdiminished during search our study might not be idealto examine the contents of the bottom-up priority mapHowever the explaining-away analysis built intomultivariate logistic regression enabled us to teaseapart the relative influences of a multiscale saliencymetric and edgendashenergy

Multivariate analysis might help resolve conflictingfindings in many human studies of eye movementbehavior (Baddeley amp Tatler 2006 Einhauser amp Konig2003 Reinagel amp Zador 1999) Although some of thesestudies involve free-viewing tasks and are therefore notdirectly comparable our findings agree best with thestudy by Baddeley and Tatler (2006) who found thatedge-detectors at two different spatial scales betterpredicted fixation behavior than contrast or luminanceIn contrast a free-viewing study comparing primate

and human fixations (Berg et al 2009) suggested thatsaliency triumphs over luminance contrast Our ap-proach to model peripheral degradation for featureextraction and to deploy an explaining-away techniquefor data analysis could potentially resolve this dis-agreement

The contents of the top-down search template

A long-standing hypothesis in the area of visualsearch (Treisman amp Gelade 1980 Wolfe 1994)suggests that search is guided by the matching ofincoming visual input to an internal search templateThe extent to which the internal template approximatesthe target is a matter of current debate On the onehand more information in the internal template seemsto result in better search performance (eg Hwang etal 2009) On the other hand if the internal template isapproximate it allows for variability in target featuresacross trials (eg Bravo amp Farid 2009) AlthoughBravo and Farid (2009) assessed various task-relevantcandidate features (including similarity of luminancecontrast spatial frequency orientation and gradient ofthe search target) for their predictive power of fixationlocations their use of Pearson correlation coefficientand ROC areas prevents a fair comparison of therelative predictive power of correlated features Fur-ther the extent to which the search target can bedescribed by low-level visual features also influences theinternal template (Bravo amp Farid 2012 Reeder ampPeelen 2013) making generalizations across tasksdifficult In our study the search target is a fixed Gaborpatch that can be described easily by low-level featuresOur results show that relevance (a metric thatmaximizes an exact match between the Gabor and thelocal visual features) orientedness and the orientation-sensitive metric of verticalness can predict fixations tovarying degrees in the two monkeys Thus even in therestricted context of our Gabor search task it seemsthat monkeys adopt different internal templates com-prising either all of the target features (relevance) oronly a subset of its features (verticalness horizontal-ness)

Conclusions

We studied the problem of why we look where we doby studying search in naturalistic conditions Bymodeling peripheral visual degradation in retinocentriccoordinates and by using nuanced analysis methodswe were able to quantify the relative extents to whichvarious bottom-up and task-relevant image featuresinfluenced eye movements These analysis methods also

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 16

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 17: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

enabled us to distinguish between features that wereimportant for fixations in general and features thatwere specifically important for the search tasks studiedWe showed how this search strategy evolves over timeand how distinct factors are important at different timescales Our study thus establishes essential groundworkto examine the neural representation of priority and theneural mechanisms by which a saccade decision iscomputed using neurophysiological methods in non-human primate models

Keywords nonhuman primates eye movements nat-ural scenes visual search priority map peripheral visionsaliency edgendashenergy relevance orientation statistics

Acknowledgments

This work was funded by NIH Grant1R01EY021579-01A1 We thank B M lsquot Hart foruseful discussions and comments on the manuscript

Commercial relationships noneCorresponding author Pavan RamkumarEmail pavanramkumarnorthwesterneduAddress Department of Physical Medicine and Reha-bilitation Northwestern University and RehabilitationInstitute of Chicago Chicago IL USA

Footnote

1 For any feature an AUC 05 suggests thatM(fixated) M(controls)

References

Baddeley R J amp Tatler B W (2006) High frequencyedges (but not contrast) predict where we fixate ABayesian system identification analysis VisionResearch 46 2824ndash2833

Berg D J Boehnke S E Marino R A Munoz DP amp Itti L (2009) Free viewing of dynamicstimuli by humans and monkeys Journal of Vision9(5)19 1ndash15 httpwwwjournalofvisionorgcontent9519 doi1011679519 [PubMed][Article]

Bisley J W amp Goldberg M E (2010) Attentionintention and priority in the parietal lobe AnnualReview of Neuroscience 33 1ndash21

Borji A Sihite N D amp Itti L (2013) Quantitativeanalysis of humanndashmodel agreement in visual

saliency modeling A comparative study IEEETrans Image Processing 23 55ndash69

Bravo M J amp Farid H (2009) The specificity of thesearch template Journal of Vision 9(1)34 1ndash9httpwwwjournalofvisionorgcontent9134doi1011679134 [PubMed] [Article]

Bravo M J amp Farid H (2012) Task demandsdetermine the specificity of the search templateAttention Perception and Psychophysics 74 124ndash131

Castelhano M S amp Heaven C (2010) The relativecontribution of scene context and target features tovisual search in scenes Attention Perception ampPsychophysics 72 1283ndash1297

Cristino F amp Baddeley R J (2009) The nature ofvisual representations involved in eye movementswhen walking down the street Visual Cognition 17880ndash903

Ehinger K A Hidalgo-Sotelo B amp Torralba A(2009) Modeling search for people in 900 scenes Acombined source model of eye guidance VisualCognition 17 945ndash978

Einhauser W amp Konig P (2003) Does luminance-contrast contribute to a saliency map for overtvisual attention European Journal of Neuroscience17 1089ndash1097

Einhauser W Rutishauser U amp Koch C (2008)Task-demands can immediately reverse the effectsof sensory-driven saliency in complex visual stimuliJournal of Vision 8(2)2 1ndash19 httpwwwjournalofvisionorgcontent822 doi101167822 [PubMed] [Article]

Engmann S lsquot Hart B M Sieren T Onat SKonig P amp Einhauser W (2009) Saliency on anatural scene background Effects of color andluminance contrast add linearly Attention Per-ception amp Psychophysics 71 1337ndash1352

Fecteau J H amp Munoz D P (2006) Saliencerelevance and firing A priority map for targetselection Trends in Cognitive Sciences 10 382ndash390

Fernandes H L Stevenson I H Phillips A NSegraves M A amp Kording K P (2014) Saliencyand saccade encoding in the frontal eye field duringnatural scene search Cerebral Cortex 24 3232ndash3245

Findlay J M (1982) Global visual processing forsaccadic eye movements Vision Research 22 1033ndash1045

Ganguli D Freeman J Rajashekar U amp Simon-celli E P (2010) Orientation statistics at fixationJournal of Vision 10(7) 533 httpwww

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 17

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 18: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

journalofvisionorgcontent107533 doi101167107533 [Abstract]

Geisler W S amp Perry J S (1998) Real-time foveatedmultiresolution system for low-bandwidth videocommunication In Photonics Westrsquo98 ElectronicImaging 294ndash305 International Society for Opticsand Photonics

Gottlieb J (2012) Attention learning and the valueof information Neuron 76 281ndash295

Gottlieb J Oudeyer P Y Lopes M amp Baranes A(2013) Information-seeking curiosity and atten-tion Computational and neural mechanismsTrends in Cognitive Sciences 17 585ndash593

Hays A V Richmond B J amp Optican L M (1982)Unix-based multiple-process system for real-timedata acquisition and controlWESCON ConferenceProceedings 2 1ndash10

Henderson J M Brockmole J R Castelhano M SMack M L V R Fischer M Murray W ampHill RW (2007) Visual saliency does not accountfor eye movements during visual search in real-world scenes In R P G van Gompel (Ed) Eyemovements A window on mind and brain (pp 537ndash562) Amsterdam The Netherlands Elsevier

Hooge I T C amp Erkelens C J (1999) Peripheralvision and oculomotor control during visual searchVision Research 39 1567ndash1575

Hwang A D Higgins E C amp Pomplun M (2009)A model of top-down attentional control duringvisual search in complex scenes Journal of Vision9(5)25 1ndash18 httpwwwjournalofvisionorgcontent9525 doi1011679525 [PubMed][Article]

Itti L amp Koch C (2001) Computational modeling ofvisual attention Nature Reviews Neuroscience 2194ndash203

Itti L Koch C amp Niebur E (1998) A model ofsaliency-based visual attention for rapid sceneanalysis IEEE Transactions on Pattern Analysisand Machine Intelligence 20 1254ndash1259

Kanan C Tong M H Zhang L amp Cottrell G W(2009) SUN Top-down saliency using naturalstatistics Visual Cognition 17 979ndash1003

Kano F amp Tomonaga M (2009) How chimpanzeeslook at pictures A comparative eye tracking studyProceedings of the Royal Society B 276 1949ndash1955

Kienzle W Franz M O Scholkopf B amp Wich-mann F A (2009) Center-surround patternsemerge as optimal predictors for human saccadetargets Journal of Vision 9(5)7 1ndash15 httpwwwjournalofvisionorgcontent957 doi101167957 [PubMed] [Article]

Krieger G Rentschler I Hauske G Schill K ampZetzsche C (2000) Object and scene analysis bysaccadic eye-movements An investigation withhigher-order statistics Spatial Vision 13 201ndash214

Kuo R-S Kording K amp Segraves M A (2012)Predicting rhesus monkey eye movements duringnatural image search New Orleans LA Society forNeuroscience

Le Meur O Le Callet P amp Barba D (2007)Predicting visual fixations on video based on low-level visual features Vision Research 47 2483ndash2498

Malcolm G L amp Henderson J M (2009) The effectsof template specificity on visual search in real-world scenes Evidence from eye movementsJournal of Vision 9(11)8 1ndash13 httpwwwjournalofvisionorgcontent9118 doi1011679118 [PubMed] [Article]

McFadden D (1974) Conditional logit analysis ofqualitative choice behavior In P Zarembka (Ed)Frontiers in econometrics (pp 105ndash142) New YorkAcademic Press

Najemnik J amp Geisler W S (2005) Optimal eyemovement strategies in visual search Nature 434387ndash391

Najemnik J amp Geisler W S (2008) Eye movementstatistics in humans are consistent with an optimalsearch strategy Journal of Vision 8(3)4 1ndash14httpwwwjournalofvisionorgcontent834 doi101167834 [PubMed] [Article]

Najemnik J amp Geisler W S (2009) Simplesummation rule for optimal fixation selection invisual search Vision Research 49 1286ndash1294

Navalpakkam V amp Itti L (2006) An integratedmodel of top-down and bottom-up attention foroptimal object detection In Proceedings of theIEEE Conference on Computer Vision and PatternRecognition 2 (pp 2049ndash2056)

Nelder J A amp Wedderburn R W (1972) General-ized linear models In Journal of the RoyalStatistical Society Series A (General) (pp 370ndash384)

Oliva A amp Torralba A (2001) Modeling the shapeof the scene A holistic representation of the spatialenvelope International Journal of Computer Vision42 145ndash175

Phillips A amp Segraves M (2010) Predictive activityin the macaque frontal eye field neurons duringnatural scene searching Journal of Neurophysiolo-gy 103 1238ndash1252

Rajashekar U Bovik A C amp Cormack L K(2006) Visual search in noise Revealing the

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 18

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 19: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

influence of structural cues by gaze-contingentclassification image analysis Journal of Vision 6(4)7 379ndash386 httpwwwjournalofvisionorgcontent647 doi101167647 [PubMed] [Article]

Rajashekar U Cormack L K amp Bovik A C(2003) Image features that draw fixations InProceedings of the International Conference onImage Processing 2 (pp 313ndash316)

Reeder R R amp Peenlen M V (2013) The contentsof the search template for category-level search innatural scenes Journal of Vision 13(3)13 1ndash13httpwwwjournalofvisionorgcontent13313doi10116713313 [PubMed] [Article]

Reinagel P amp Zador A M (1999) Natural scenestatistics at the centre of gaze Network 10 341ndash350

Renninger L W Verghese P amp Coughlan J (2007)Where to look next Eye movements reduce localuncertainty Journal of Vision 7(3)6 1ndash17 httpwwwjournalofvisionorgcontent736 doi101167736 [PubMed] [Article]

Schutz A C Trommershauser J amp Gegenfurtner KR (2012) Dynamic integration of informationabout salience and value for saccadic eye move-ments Proceedings of the National Academy ofSciences 109(19) 7547ndash7552

Serences J T amp Yantis S (2006) Selective visualattention and perceptual coherence Trends inCognitive Sciences 10 38ndash45

lsquot Hart B M Schmidt H C E F Roth C ampEinhauser W (2013) Fixations on objects innatural scenes Dissociating importance from sa-lience Frontiers in Psychology 4

Tatler B W amp Vincent B T (2009) The prominenceof behavioral biases in eye guidance VisualCognition 17 1029ndash1054

Torralba A amp Oliva A (2003) Statistics of naturalimage categories Network Computation in NeuralSystems 14 391ndash412

Treisman A M amp Gelade G (1980) A feature-integration theory of attention Cognitive Psychol-ogy 12 97ndash136

Tseng P H Carmi R Cameron I G M MunozD P amp Itti L (2009) Quantifying center bias ofobservers in free viewing of dynamic naturalscenes Journal of Vision 9(7)4 1ndash17 httpwwwjournalofvisionorgcontent974 doi101167974 [PubMed] [Article]

Walther D amp Koch C (2006) Modeling attention tosalient proto-objects Neural Networks 19 1395ndash1407

Wilming N Harst S Schmidt N amp Konig P

(2013) Saccadic momentum of facilitation ofreturn saccades contribute to an optimal foragingstrategy PLoS Computational Biology 9 e1002871

Wolfe J M (1994) Guided search 20 A revisedmodel of visual search Psychonomic Bulletin andReview 1 202ndash238

Zelinsky G J (2008) A theory of eye movementsduring target acquisition Psychological Review115 787ndash835

Zelinsky G J Adeli H Peng Y amp Samaras D(2013) Modeling eye movements in a categoricalsearch task In Philosophical Transactions of theRoyal Society B 368 20130058

Zelinsky G J Rao R Hayhoe M amp Ballard D(1997) Eye movements reveal the spatio-temporaldynamics of visual search Psychological Science 8448ndash453

Appendix A Logistic regression

Model

Logistic regression is ideal for two-class classificationproblems with the two classes in our problemrepresented by fixated patches and shuffled controlsrespectively

In logistic regression the output is modeled as aBernoulli random variable with mean probability ofsuccessful outcome given by

pethy frac14 1jXbTHORN frac14 1

1thorn eXbethA1THORN

A Bernoulli random variable represents the outcomeof a biased coin toss in our problem the potentialoutcomes of the toss represent the possibilities ofwhether the patch was truly fixated or whether it was arandom shuffled control

yBernoulli pethy frac14 1jXbTHORNf g ethA2THORNThe probability of the outcome is then defined by the

joint influence of independent variables in the designmatrix X Here each column of X comprises the visualfeatures extracted from the image patches centered ateach fixation and each row is a fixation A bias term isalso included

The regression problem is then solved by estimatingthe coefficients b that minimize the negative loglikelihood of the model given by

L frac14 Xi

yilogethpiTHORN Xi

eth1 yiTHORNlogeth1 piTHORN ethA3THORN

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 19

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13
Page 20: Pavan Ramkumar - Modeling peripheral visual acuity enables discovery …pavanramkumar.github.io/pdfs/08-Ramkumar_etal_JVis_2015.pdf · 2017-01-24 · Modeling peripheral visual acuity

Unlike linear regression where the residuals arenormally distributed and a closed-form solution existslogistic regression is a convex problem in that it istypically solved using gradient-based methods Inpractice we implemented the method using Matlabrsquosglmfit function

Goodness of fit and model comparison usinglikelihood ratios

We computed the likelihood ratios between themodel under consideration and a null model with onlya bias term The likelihood ratios are defined in termsof the model deviances given by

D0 frac14 2logethL0=LsatTHORN ethA3THORN

D frac14 2logethL=LsatTHORN ethA4THORNwhere L0 Lsat and L are the likelihoods of the nullmodel (one that only predicts the bias in the data) thesaturated model (one that perfectly predicts the data)and the model under consideration The likelihoodratio is then given by

DD0 frac14 2logethL=L0THORN ethA5THORNA negative likelihood ratio suggests that the model

under consideration significantly outperforms the nullmodel The likelihood ratio is distributed as a chi-squared statistic and can therefore be converted into agoodness of fit v2 measure given the number ofindependent variables in the model

Relative pseudo-R2

A related metric to the likelihood ratio is the pseudo-R2 The idea of the pseudo-R2 metric is to map thelikelihood ratio into a [0 1] range thus offering anintuition similar to the R2 for normally distributeddata

Many definitions exist for the pseudo-R2 but weused McFaddenrsquos formula RD

2frac14 1 LL0 where L isthe log-likelihood of the model under considerationand L0 is the log-likelihood of the baseline (intercept-only) model (McFadden et al 1974)

To estimate the relative effect size of each feature ora subset of features we used a leave-one-feature-out(leave-some-features-out) technique a variant of bestsubsets of stepwise regression techniques For eachfeature we fit a partial model comprising all but thatparticular feature (or subset) and computed thepseudo-R2 of the partial model Then to obtain ameasure of the relative increase in predictive powerobtained by adding that feature (or subset) back to thepartial model we used a measure called relativepseudo-R2 (Fernandes et al 2013) The relativemeasure is defined as 1 LfullLpartial where Lfull andLpartial are the log-likelihoods of the full and partialmodels The confidence intervals on this statistic wereobtained by computing standard errors on 10 cross-validation folds the relative pseudo-R2 in each case wascomputed on the held-out test set

Journal of Vision (2015) 15(3)19 1ndash20 Ramkumar Fernandes Kording amp Segraves 20

Downloaded From httpjovarvojournalsorgpdfaccessashxurl=dataJournalsJOV933691 on 11022015

  • Introduction
  • Methods
  • f01
  • e01
  • f02
  • f03
  • e02
  • e03
  • e04
  • e05
  • e06
  • e07
  • f04
  • Results
  • f05
  • f06
  • f07
  • t01
  • Discussion
  • f08
  • t02a
  • t02b
  • t02c
  • t02d
  • Conclusions
  • n1
  • Baddeley1
  • Berg1
  • Bisley1
  • Borji1
  • Bravo1
  • Bravo2
  • Castelhano1
  • Cristino1
  • Ehinger1
  • Einhauser1
  • Einhauser2
  • Engmann1
  • Fecteau1
  • Fernandes1
  • Findlay1
  • Ganguli1
  • Geisler1
  • Gottlieb1
  • Gottlieb2
  • Hays1
  • Henderson1
  • Hooge1
  • Hwang1
  • Itti1
  • Itti2
  • Kanan1
  • Kano1
  • Kienzle1
  • Krieger1
  • Kuo1
  • LeMeur1
  • Malcolm1
  • McFadden1
  • Najemnik1
  • Najemnik2
  • Najemnik3
  • Navalpakkam1
  • Nelder1
  • Oliva1
  • Phillips1
  • Rajashekar1
  • Rajashekar2
  • Reeder1
  • Reinagel1
  • Renninger1
  • Schutz1
  • Serences1
  • ltHart1
  • Tatler2
  • Torralba1
  • Treisman1
  • Tseng1
  • Walther1
  • Wilming1
  • Wolfe1
  • Zelinsky1
  • Zelinsky2
  • Zelinsky3
  • Appendix A Logistic regression
  • e08
  • e09
  • e10
  • e11
  • e12
  • e13