Download - Thespatio-temporaldynamicsofvisualletterrecognition · Thespatio-temporaldynamicsofvisualletterrecognition Daniel Fiset University of Victoria, Victoria, British Columbia, Canada

The spatio-temporal dynamics of visual letter recognition

Daniel FisetUniversity of Victoria, Victoria, British Columbia, Canada

Caroline Blais, Martin Arguin, Karine Tadros, Catherine Ethier-Majcher, Daniel Bub,and Frederic Gosselin

Centre de Recherche en Neuropsychologie Experimentale et Cognition, Departement de Psychologie, Universite de Montreal,Montreal, Quebec, Canada

We applied the Bubbles technique to reveal directly the spatio-temporal features of uppercase Arialletter identification. We asked four normal readers to each identify 26,000 letters that were randomlysampled in space and time; afterwards, we performed multiple linear regressions on the participant’sresponse accuracy and the space–time samples. We contend that each cluster of connected significantregression coefficients is a letter feature. To bridge the gap between the letter identification literatureand this experiment, we also determined the relative importance of the features proposed in the letteridentification literature. Results show clear modulations of the relative importance of the letterfeatures of some letters across time, demonstrating that letter features are not always extractedsimultaneously at constant speeds. Furthermore, of all the feature classes proposed in the literature,line terminations and horizontals appear to be the two most important for letter identification.

Keywords:Q2

MostQ3

textbooks in cognitive psychology usealphabetic characters identification as a startingpoint in understanding the visual and cognitivemechanisms involved in visual object recognition(e.g., Lindsay & Norman, 1977; Matlin, 2005;Medin, Ross, & Markman, 2005; Neisser, 1967).As a microcosm, letters have many advantagesover other more ecologically oriented categories

of visual objects (e.g., animals, houses, etc.).They were designed as a limited set of objectscomposed of a limited assortment of traits, withthe means to meet specific communication needs(i.e., reading and writing). These traits have rela-tively simple shapes and are typically displayedusing two tones. For example, the uppercaseletter “A” in the Arial font is composed of two

PCGN342284 TECHSET COMPOSITION LTD, SALISBURY, U.K. 9/16/2008

Correspondence should be addressed to Martin Arguin or Frederic Gosselin, Departement de psychologie, Universite deMontreal, C.P. 6128, Succ. Centre-ville, Montreal, QC, H3C 3J7, Canada (E-mail: [email protected] [email protected]).

We thank those who took part in this study. This research was supported by a grant from the Canadian Institute of HealthResearch (CIHR) to Martin Arguin, Frederic Gosselin, and Daniel Bub; by grants from the Natural Sciences and EngineeringResearch Council of Canada (NSERC) to Martin Arguin and to Frederic Gosselin; by a scolarship from the JamesS. McDonnell Foundation (Perceptual Expertise Network) and by a postdoctoral scholarship from the Fonds Quebecois deRecherche en Nature et Technologies (FQRNT) to Daniel Fiset; and by a FQRNT graduate scholarship to Caroline Blais andto Karine Tadros.

# 0000 Psychology Press, an imprint of the Taylor & Francis Group, an Informa business 1http://www.psypress.com/cogneuropsychology DOI:10.1080/02643290802421160

COGNITIVE NEUROPSYCHOLOGY, 0000, 00 (0), 1–12

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

slanted lines inclined at about 60 degrees that arejoined at their superior extremity and that are con-nected by a horizontal line that touches both slantsapproximately in their centres. Of course, the traitdescription of the uppercase “A” may vary slightlyacross fonts but it remains sufficiently robust thatthe letter remains recognizable despite thesesmall variations. Furthermore, letter identificationas the starting point of word recognition andreading (Pelli, Farell, & Moore, 2003) has strongreal-life relevance since a large proportion of ourmodern life activities involve this function. Asthe above illustration suggests, it is relatively easyto enumerate the letter traits necessary to describeeach of the 26 uppercase letters of the alphabet inpredefined fonts (e.g., Arial). Whether such traitshave any relevance for the way human letteridentification is achieved, however, remainsuncertain.

Seeking to identify a psychologically valid setof letter features (e.g., E. J. Gibson, 1969;Rumelhart & Siple, 1974), researchers in cognitivesciences have privileged the use of data from con-fusion matrices (Boles & Clifford, 1989; Bouma,1971; Briggs & Hocevar, 1975; Gervais, Harvey,& Roberts, 1984; Geyer, 1977; Gilmore, Hersh,Caramazza, & Griffin, 1979; Loomis, 1982;Townsend, 1971; Van Der Heijden, Malhas, &Van Den Roovaart, 1984). A confusion matrix isconstructed by measuring the human participant’sability to distinguish single letters in very demand-ing or special conditions so that errors frequentlyoccur—typically as often as on 50% of trials. Forexample, some researchers examined the perform-ance of children who had not yet integrated theexact visual form of letters (E. J. Gibson,Gibson, Pick, & Osser, 1962); others studied theperformance of skilled readers when identifyingletters presented for a brief duration (Townsend,1971) or with extremely low contrast (Geyer,1977). In these confusion matrices, errors inletter discrimination are thought to be helpful indefining the traits necessary for distinguishingletters from one another. Hence, it is commonlyassumed that the frequent confusion between theuppercase “E” and “F” in these specific conditionsvalidates the inferior horizontal line of the

uppercase “E” as a diagnostic trait for therecognition of these letters. Even if this propositionmakes sense, it does not tell us which part(s) of thebar help to discriminate between these two letters.For instance, it could be the intersection betweenthe vertical and the horizontal bar, the terminationof the horizontal bar, or the horizontal bar itself.

Much difficulty has been encountered when itcomes to pinpointing the exact diagnostic areasfor letter discrimination. We believe thisoriginates from the vast gap between the letterconfusion data that has been compiled and theletter features that have been proposed. In fact,we question whether letter-confusion matricesconstitute the appropriate tool to provide a deci-sive set of data for determining the diagnosticfeatures for letter identification. In particular, itis important to bear in mind that all the exper-imental manipulations required for the creationof letter-confusion matrices (low contrast orrapid presentation) exacerbate the relative import-ance of low spatial frequencies (e.g., Mazer, Vinje,McDermott, Schiller, & Gallant, 2002). Since thisvisual information is not optimal for human visionand leads to very high error rates, it may beinadequate for the discovery of the letter featuresunderlying reading in daily life.

In this study, we used a classification image tech-nique (e.g., Eckstein&Ahumada, 2002;Gosselin&Schyns, 2004) called Bubbles (Gosselin & Schyns,2001) that uncovers more directly the letter com-ponents driving accurate recognition (Gosselin &Schyns, 2001). The underlying logic of Bubbles isthat if some piece of visual information is necessaryto perform the task at hand, masking this infor-mation will impair performance, and revealing itwill lead to a better performance. The plane ofregression coefficients that is obtained throughmultiple linear regressions of performance as afunction of the Bubbles masks used to sampleinformation on every trial, is called a classificationimage, and it reveals the effective informationupon which the observed performances are based.We recently employed a version of Bubbles inorder to reveal the potent visual features mediatinguppercase and lowercase Arial letter identificationacross different spatial frequency bands (Fiset

2 COGNITIVE NEUROPSYCHOLOGY, 0000, 00 (0)

FISET ET AL.

95

100

105

110

115

120

125

130

135

140

145

150

155

160

165

170

175

180

et al., 2008Q4 ). The analyses conducted separatelyon each of the 26 uppercase and 26 lowercaseletters confirmed that the spatial frequency infor-mation between 2 and 4 cycles per letter conveysthe most potent visual information (Chung,Legge, & Tjan, 2002; Ginsburg, 1980; Legge,Pelli, Rubin, & Scleske, 1985Q5 ; Majaj, Pelli,Kurshan, & Palomares, 2002; Parish & Sperling,1991; Solomon & Pelli, 1994Q6 ). To synthesize thelarge amount of data obtained from this exper-iment and to link it to the letter identificationliterature published during the years 1960–1980,we also determined the relative importance ofthe sets of features proposed in that literature aswell as the relative importance of line termin-ations. We found that terminations, relativelysmall features found at the extremities of lines,and horizontals, were the most effective indriving performance. To the best of our knowl-edge, this was the first empirical demonstrationthat line terminations are of the crucial importancefor letter identification.

Here, we examine the space–time features forArial uppercase letter identification by using adynamic version of the Bubbles method (Blais,Fiset, Arguin, Jolicoeur, & Gosselin, 2008Q7 ;Vinette, Gosselin, & Schyns, 2004). Extendingthe logic of the spatial Bubbles technique brieflydiscussed above to the time dimension amountsto saying that the probability of a correct answershould decrease if the information that is efficientfor letter identification at a particular spatiallocation and moment is not revealed at thatspatial location and moment and that it shouldincrease if this information is revealed at thatspatial location and moment. Therefore, in orderto determine the efficient use of spatio-temporalinformation, we perform multiple linear regressionbetween the participant’s response accuracy andthe space–time bubbles.

Method

ParticipantsA total of 4 students from the University ofMontreal took part in this experiment. All hadnormal or corrected-to-normal visual acuity.

Materials and stimuliStimuli were displayed on a high-resolution Sonymonitor at a refresh rate of 120 Hz. The exper-iment ran on a Macintosh G4 computer. Theexperimental program was written in Matlabusing functions from the Psychophysics Toolbox(Brainard, 1997; Pelli, 1997). The viewing dis-tance was maintained at 57 cm by using a chinrest.Stimuli were uppercase letters printed in Arial fontsubtending on average 0.78 degrees of visual anglehorizontally and 0.97 degrees of visual angle verti-cally. They appeared in dark grey (2.1 cd/m2) overa light grey background (64.8 cd/m2) and weresampled in space and time. More specifically, the“bubblized” movies consisted of a sequence of 12successive frames, each presented on screen for aduration of 8.33 ms (for a total stimulus durationof 100 ms), displaying one letter of the alphabetsampled with Gaussian apertures (i.e., bubbles)randomly located in space–time (see Figure 1). Q8Therefore, the spatial information (e.g., differentgroups of pixels in a letter) available to participantsvaried as a function of time within a trial, and thesequence of space–time bubbles also varied ran-domly across trials. Each bubble had a standarddeviation of 0.1 degrees of visual angle (3 pixels)in the spatial domain and a standard deviation of17.3 ms (2.08 frames) in the temporal domain.The temporal full width at half maximum of abubble—40.7 ms—is less than the time requiredto plan and execute an attentional saccade, whichensures that participants are unable to shift theirattention towards a particular bubble (i.e., esti-mates of the time needed to plan and execute anattentional saccade typically range between 50and 85 ms; e.g., Wolfe, 1998; Wolfe, Alvarez, &Horowitz, 2000).

ProcedureEach participant performed 26,000 trials, eachletter of the alphabet being presented an averageof 1,000 times. The experiment was divided in100 blocks of 260 trials each. The letter identifi-cation accuracy was maintained at 51% by adjust-ing the number of bubbles on a trial-by-trialbasis using QUEST (Watson & Pelli, 1983).Since visual processing difficulty may vary across

COGNITIVE NEUROPSYCHOLOGY, 0000, 00 (0) 3

SPATIO-TEMPORAL DYNAMICS OF LETTER RECOGNITION Q1

185

190

195

200

205

210

215

220

225

230

235

240

245

250

255

260

265

270

the alphabet, the number of space–time bubbleswas adjusted independently for each letter. Theinitial number of bubbles for each letter was deter-mined individually for each participant by admin-istering two practice blocks, each composed of 50trials, before the experimental blocks began.

On each trial, a homogenous grey screen wasfirst displayed for 250 ms, accompanied by a122-ms 1300-Hz pure tone to signal the begin-ning of the trial. The grey screen was immediatelyreplaced by a bubblized letter movie that lasted100 ms. This was immediately followed by a hom-ogenous grey screen that remained visible until theparticipant responded. The task was to identify thetarget letter, and participants registered theirresponses by pressing the appropriate key on thekeyboard. The next trial was triggered automati-cally after a 2-s intertrial interval. Participantsreceived no feedback on their performance.

Results and discussion

The number of bubbles necessary to maintain per-formance at a 51% correct for each letter of thealphabet is reported in Table 1. The efficient useof the spatio-temporal information in the stimulus

was determined by performing a multiple linearregression on the bubbles’ volumes (explanatoryvariables) and the participants’ response accuracy(predictor variable). That is, we constructed, foreach participant, one regression coefficientvolume (the two spatial dimensions and the tem-poral dimension) for each letter of the alphabetby subtracting, for a given letter, the sum of thebubbles’ volumes that led to an incorrect responsefrom the sum of the bubbles’ volumes that led to acorrect response. These volumes of regressioncoefficients are referred to as classification movies,which is a straightforward extension of classifi-cation images. The elements of these moviesare referred to as voxels (by analogy to pixels inclassification images).

If all 64 ! 64 ! 12 voxels were of equalimportance for successful letter identification,they would have uniform regression values. Anylocal divergence from uniformity indicates thatthis particular part of the stimulus (in space–time) was particularly important for the task athand. The statistical analysis was restricted tothe spatial central horizontal strip in the classifi-cation movies (40 ! 64 ! 12 voxels) where theletters were located. The strips above and below

Figure 1. Illustration of the spatio-temporal stimulus sampling. A total of 12 frames were presented successively at a rate of 120 Hz (i.e.,8.33 ms per image). In these frames, visual information was randomly sampled in space and time using Gaussian apertures (bubbles)with a spatial and a temporal extent. The standard deviation of the bubbles on the time dimension was chosen such that the duration ofone bubble was shorter than the time required to plan and execute an attentional saccade.


FISET ET AL.

275

280

285

290

295

300

305

310

315

320

325

330

335

340

345

350

355

360

were used to estimate the mean and the standarddeviation of the null distribution and to transformthe classification movies into Z-scores. A total of26,000 trials might seem like a lot but in thiscase it is not enough to obtain one classificationmovie per letter for each participant. We thussummed the classification movies transformedinto Z-scores across the 4 participants and, totransform the resulting group classificationmovies into Z-scores, divided them by

p4.

Finally, to determine the letter space–time infor-mation significantly correlated with accuracy, weconducted a one-tailed Pixel test (Chauvin,Worsley, Schyns, Arguin, & Gosselin, 2005) onthe group classification movies transformed intoZ-scores (Sr ! 30,720 voxels; full-width halfmaximum ! 2.66; i.e., the geometric mean ofthe spatial and the time full-width half

maximums; Zcrit ! 4.46; p, .001). The statisticalthreshold provided by this test corrects for mul-tiple comparisons while taking the spatial andtemporal correlation inherent to our techniqueinto account.

Movies directly representing the space–timeuse of letter information are available on http://www.mapageweb.umontreal.ca/gosselif/dynamic_letters Q9. Figure 2 depicts the same results in twodimensions while losing as little information aspossible. In a statistically thresholded classificationmovie, some significant voxels are connectedtogether, and some are not. For example, on thebottom termination of the letter “I” (see Figure 2),it is likely that more than one voxel will be useful,and that most of the useful voxels on that letterfeature will be connected in space or in time. In con-trast, the voxels located on the top termination ofthe letter “I” may not be connected with those ofthe bottom termination since both groups are faraway from each other. We contend that eachcluster of connected significant voxels is a letterfeature. The temporal dimension of classificationmovies also informs us about the order in whichthese letter features are acquired. Therefore, wedivided the significant voxels into space–time clus-ters of connected significant voxels. More precisely,we searched for so-called “26-connected” voxels(i.e., adjacent either in one of the six cardinal direc-tions or in one of the 20 oblique directions).However, a cluster of significant voxels couldcontain as little as one voxel. We then collapsedthe time dimension of the classification movies torepresent the three dimensions of our results on atwo-dimensional figure; each cluster was reducedto its spatial silhouette. We depicted the differentcluster silhouettes in different colours to facilitateinterpretation. The times of onset and offset ofevery cluster is indicated in white next to the cluster.

As mentioned in the Introduction, the litera-ture on letter identification has already proposedvarious sets of letter features that are assumed tounderlie identification. The issue of the relativeimpact of these different features on letteridentification has generated significant interest.Because the Bubbles method is pixel or voxelbased, it does not require an a priori definition of

Table 1. Average number of bubblesrequired to maintain performance at51% correct at the end of the experiment

Number of bubbles

A 50.0B 84.6C 78.2D 58.2E 68.2F 82.1G 58.8H 38.2I 132.0J 94.7K 38.9L 92.4M 47.8N 43.7O 108.2P 88.7Q 91.4R 72.9S 59.1T 47.9U 62.2V 59.4W 39.1X 40.6Y 46.4Z 45.4



365

370

375

380

385

390

395

400

405

410

415

420

425

430

435

440

445

450

what the features for letter identification are. Aswe have described above, the classificationmovies as well as Figure 2 reveal, by themselves,the shape and the position of the efficient fea-tures—clusters of significant voxels. However, itis possible to decompose the classification moviesinto any set of features so as to assess the degreeto which it accounts for recognition performance(e.g., B. Gibson, Lazareva, Gosselin, Schyns, &Wasserman, 2007). To bridge the gap between

the letter identification literature and the exper-iment reported in this article, we determined therelative importance of the features proposed inthe letter identification literature. More specifi-cally, we conducted a priori feature analyses forall the letters of the alphabet grouped together(similarly to Fiset et al., 2008 Q4) as well as for eachletter separately (Figure 3) Q10. We first created 111templates by decomposing each letter into thefull complement of local features that have been

Figure 3. Templates of letter features used for the feature analysis. One template was created for each feature (i.e., intersections, horizontals,verticals, slants tilted right, slants tilted left, curves opened at the top, curves opened at the bottom, curves opened on the left, curves opened onthe right, and terminations) present in each letter of the alphabet. The pixels comprised in each template are depicted in grey.

Figure 2. Colours show space–time clusters (collapsed on the time dimension) significantly correlated with correct letter identification(p " .001) superimposed upon the appropriate letter. Four colours were used to help cluster segregation. The numbers in white near eachcluster indicate the beginning and end of this cluster relative to stimulus onset. To view a colour version of this figure, please see theonline issue of the Journal.


FISET ET AL.

455

460

465

470

475

480

485

490

495

500

505

510

515

520

525

530

535

540

proposed in the literature except for global featuressuch as symmetry, cyclic change, and parallelism,which were not considered—vertical, horizontal,slant tilted left or right, curves opened up, down,left or right, and intersections. We also includedterminations, a feature that has only been con-sidered by Fiset et al. (2008)Q4 . The terminationsand intersections were defined as letter inkwithin a radius of 13 pixels of the centre of thefeature, as identified by the authors. To makesure that the masks for these features wereindependent from those of other features, we sub-tracted the area corresponding to the terminationsand intersections from the other feature masks.We then calculated, for each letter and frame,the proportion of significant voxels falling oneach feature. Only the voxels falling directly onletter print were included in the analysis. Thevectors of 10 proportions were each normalizedto 1 in order to reveal the relative importance of

all features for each letter. The results of thisanalysis for each individual letter are presentedon Figure 4. Error bars were computed via boot-strap; 1,000 group classification movies were com-puted by summing four classification images madeof pixels sampled randomly, with replacement,from the four classification movies of the partici-pants. In fact, the video clips available on http://mapageweb.umontreal.ca/gosselif/dynamic_letters Q9depict the sum of these bootstrap classificationmovies. Bright red means that the pixels arepresent on 100% of classification movies, blackmeans that they are present on 0.1%, and grayletter and white background means that they arepresent in none of the classification movies.These video clips thus indicate the between-subject variability. To compute the error bars inFigure 4, the group classification images obtainedvia bootstrap were analysed in exactly the same wayas the empirical group classification image: They

Figure 4. Results of the feature analysis performed on each letter and each frame. Each graph shows the relative importance of the featurescomprised in each letter of the alphabet. Note that if no significant pixel fell on one of the feature comprised in a letter (e.g., no significant pixelfell on the vertical bar in letter “B”), there is no curve corresponding to this feature in the graph. Error bars indicate 95% confidence intervals.To view a colour version of this figure, please see the online issue of the Journal.



545

550

555

560

565

570

575

580

585

590

595

600

605

610

615

620

625

630

were smoothed, were transformed into Z-scores,were submitted to Pixel tests, and underwentfeature analyses. The error bars correspond to1.96 the standard deviation observedQ11 in the simu-lated feature analyses—95% confidence intervals.Most of the between-subject variability occurs atthe beginning and at the end of stimulusduration, which could be due to differences inthe phase or the rapidity with which they startto process information. The low intersubjectvariability at the middle of stimulus presentationsuggests that the phase difference in the infor-mation extraction process of our participantswas relatively small.

Different spatio-temporal patterns may beobserved. For some letters, one feature remainsuseful from the beginning to the end of the stimuluspresentation. This is the case, for example, with theterminations in letter “I” and with the horizontal inletter “G”. A second spatio-temporal pattern thatmay be observed in the results is the simultaneouspresence of two or more letter features. Forexample, in letter “G”, the relative usefulness ofthe terminations, the horizontal bar, and the inter-section is approximately constant across time.Finally, for other letters, one feature appears earlyin the classification image, then disappears, and,sometimes, another feature appears. For example,in letter “U”, the curve opened at the top are essen-tially the only useful features from 17 to 42 ms; inletter “W”, terminations are the most usefulfeature from 25 to 42 ms after stimulus onset, andthen slants tilted right become the most usefulfeature from 50 to 100 ms after stimulus onset.

To compare our results with those of Fiset et al.(2008Q4 ), we combined the results across the 26letters of the alphabet and across time, for eachfeature class, and divided that grand total by thenumber of letters containing a given feature.This resulted in a vector of 10 numbers that wasnormalized to 1 in order to reveal the relativeimportance of all features across all the letters ofthe alphabet (see Figure 5; error bars in Figure 5Q12were computed via bootstrap like the error barsin Figure 4). Terminations and horizontals arethe most important features for uppercase Arialletter identification.

GENERAL DISCUSSION

We used Bubbles, a classification image technique,to reveal the letter areas responsible for the accu-rate identification of uppercase Arial letters inspace–time. The space–time clusters that are sig-nificantly correlated with letter identification areshown for every letter in Figure 2 (movies areavailable from http://mapageweb.umontreal.ca/gosselif/dynamic_letters Q9). These, we claimed, arethe space–time features for letter identification.Nonetheless, to create a link with the literature,we examined the relative importance of 10feature classes that have been proposed to beimportant for the letter identification (i.e., inter-sections, horizontals, verticals, slants tilted right,slants tilted left, curves opened at the top, curvesopened at the bottom, curves opened on the left,curves opened on the right, and terminations). Inthe “General discussion”, we focus on these apriori feature analyses because they suffice for thearguments put forth and because they shouldmake the arguments more concise.

In the first feature analysis, we computed theimportance of each feature class for every letterand frame (see Figure 4). If human observers pro-cessed the features of letters simultaneously at

Figure 5. Results of the overall feature analysis, all letters and allframes confounded. Error bars indicate 95% confidence intervals.To view a colour version of this figure, please see the online issueof the Journal.


FISET ET AL.

635

640

645

650

655

660

665

670

675

680

685

690

695

700

705

710

715

720

different but constant speeds—henceforth wespeak about simple parallel observers—the relativeimportance of the features would be invariantacross frames in their classification movies(McCabe, Blais, & Gosselin, 2005). To illustrate,consider the following toy problem: A simple par-allel observer is exposed for the duration of twoframes ( f1 and f1) to a pseudoletter composed oftwo parts (p1 and p2) each sufficient to identifythe pseudoletter. Bubble masks can be representedas 2 ! 2 matrices:

p1f2 p2f2p1f1 p2f1

! ",

where a cell is equal to 1 when there is a bubbleand to 0 otherwise. Suppose that the observerrequires only one frame to process p1 and twoframes to process p2. A total of 13 bubble masks(out of a possible 15) respond to those criteria:

0 0

1 0

! ",

1 0

0 0

! ",

0 1

1 0

! ",

0 0

1 1

! ",

1 0

0 1

! ",

1 1

0 0

! ",

1 0

1 0

! ",

0 1

0 1

! ",

0 1

1 1

! ",

1 1

0 1

! ",

1 0

1 1

! ",

1 1

1 0

! ", and

1 1

1 1

! ";

and the remaining two do not:

0 00 1

! "and

0 10 0

! ":

The classification movie computations describedin the “Results and discussion” section—in theongoing example, it rather is a classificationimage—would consist in summing up all bubblemasks weighted by plus or minus 1/(number ofbubbles), respectively, if the mask led to a correct

or an incorrect letter identification:

0 0

1 0

! "#

1 0

0 0

! "# 1

2

0 1

1 0

! "# 1

2

0 0

1 1

! "

# 1

2

1 0

0 1

! "# 1

2

1 1

0 0

! "# 1

2

1 0

1 0

! "# 1

2

0 1

0 1

! "

# 1

3

0 1

1 1

! "# 1

3

1 1

0 1

! "# 1

3

1 0

1 1

! "# 1

3

1 1

1 0

! "

# 1

4

1 1

1 1

! "$

0 0

0 1

! "$

0 1

0 0

! "

% 1

2

3:75 1:75

3:75 1:75

! ":

The relative importance of the two parts is thesame across frames, and this is always true ofsimple parallel observers.

Our results do not fully support the hypothesisthat humans are simple parallel observers. Indeed,there are modulations of the relative importance ofthe feature classes across time in some letters (seeFigure 4). This is particularly clear for letters“C”, “F”, “M”, “U”, “W”, and “Z”. But the hypoth-esis does fit with the results obtained for someletters. In letter “G”, for example, the relative use-fulness of the terminations, the horizontal bar, andthe intersection is approximately constant acrosstime. Interestingly, these three features are nextto each other spatially in letter “G” (see alsoFigure 2). This may result from the fact thatthese features fall within the “spotlight” of atten-tion, which permits their simultaneous processing.Note, however, that our analysis does not allow usto infer whether two regions (or more) that aresimultaneously above statistical threshold in ourclassification movies are actually processed simul-taneously. Thus, it could be that on some trials,participants used one region, and, on other trials,they used the other region.

In the second a priori feature analysis, wecomputed the importance of each feature classwith all letters and frames confounded(see Figure 5). The most important outcome ofthis analysis is the discovery of the prime import-ance of terminations and horizontals. The case of



725

730

735

740

745

750

755

760

765

770

775

780

785

790

795

800

805

810

terminations is especially surprising because noresearch team has ever suggested that termin-ations could be key features for letter identifi-cation except Fiset et al. (2008Q4 ). Indeed, Fisetet al. conducted a Bubbles experiment to uncoverthe spatial features for uppercase and lowercaseArial letter identification at different spatialscales—they did not sample time. The results ofthe a priori feature analyses in the two studiesare strikingly similar (r ! .96). Apart from inter-sections, which came in fourth position in Fisetet al. and are in sixth position here, the order ofimportance of the nine remaining features isexactly the same. This really is striking giventhat the stimulation parameters used in Fisetet al. differed greatly from those used in thepresent experiment: Fiset et al. used spatialbubbles with a standard deviation of 0.72, 0.36,0.18, 0.09, and 0.045 letter width, respectively,from the lowest to the highest spatial frequencyband along which stimuli were filtered (1–2,2–4, 4–8, 8–16, and 16–32 cycles per letter),instead of randomly located space–time bubbleswith a standard deviation of 0.13 letter widthacross space and with a standard deviation of40.8 ms across time; Fiset et al. displayed theletters for 200 ms instead of 100 ms; and Fisetet al. used letters with an average width of 1.35degrees of visual angle instead of 0.78 degrees ofvisual angle (reducing letter size is known toinduce a shift in the use of spatial informationtoward lower spatial frequencies; e.g., Majajet al., 2002). This suggests that the resultsobtained in the experiment reported in thisarticle are robust to parameter changes and thatthey generalize to different experimental con-ditions. However, it could also be that theresults are an artefact of the a priori feature analy-sis and letter statistics. Fiset et al. also report an apriori feature analysis applied on an ideal observerclassification images. An ideal observer optimallyuses all the information available to perform thetask at hand (e.g., Solomon & Pelli, 1994Q6 ). Thepurpose of such a model is not so much to fithuman data but to understand how the humandata diverge from an optimal implementationthat uses all the available information, without

constraint. For the ideal observer, the termin-ations ranked 5th and 6th out of the 10 featureclasses, and the horizontals ranked 7th and 4thfor lowercase and uppercase letters, respectively.The correlation between the relative importanceof the features for human observers and idealobserver is quite low (r ! .16). On this basis, weare confident that the prime importance of theterminations and horizontals, in particular, andthe relative importance of the other features forhuman participants is due to constraints imposedby the human visual system rather than byconstraints imposed by the stimuli or analyses.

But why exactly are terminations and horizontalsso important for human letter identification?Regarding terminations, one possible hypothesis isthat, because they are located on the extremities ofletters, they are less likely to suffer from visualcrowding than other features. This hypothesis issupported by the results of Fiset et al. (2008 Q4) withlowercase letters. Indeed, although terminationswere the most important feature for both lettercases, they were more important relative to theother features for lowercase letters than to thosefor uppercase letters. Since the distance betweenthe terminations and the rest of the letter is, onaverage, larger in lower- than in uppercase letters(i.e., because of their extensions), their sparingfrom crowding should be more important. Thishypothesis also predicts that terminations shouldbecome even more important in word recognition,where crowding is further increased by adjacentletters. Interestingly, Chung, Tjan, and Lin (2008) Q13showed that the extremities of lowercase letters arevery useful for the correct identification of themiddle letter in random triplets of letters.Regarding horizontals, it is possible that theirimportance also comes froma reduction of crowdingwhen they are part of a letter. In fact, horizontalscreate some space either around the letter (e.g., inletterT) orwithin the letter (e.g., letterH), thereforereducing the crowding between letters in a letterstring, or between features in an isolated letter.Other explanations are possible, however, andadditional research will be necessary to understandwhy terminations and horizontals are importantfor letter identification.


FISET ET AL.

815

820

825

830

835

840

845

850

855

860

865

870

875

880

885

890

895

900

One other important question that remains is:Why are the letter features extracted in the par-ticular temporal order found here (see Figure 4)?We have examined different hypotheses but nonehas yet proven effective in correctly accountingfor the findings reported above. For instance, wehave examined whether the order of feature extrac-tion follows a systematic spatial pattern. That is,were the different spatial locations (e.g., upperleft quadrant, upper middle quadrant, upperright quadrant, etc.) processed in a systematicorder? On each frame, we found a similarnumber of significant pixels across the stimulusareas, which led to a rejection of this hypothesis.We have also implemented an optimal sequentialmodel—somewhat similar to “Mr. Chips”(Legge, Klitz, & Tjan, 1997)—to reveal the“optimal” order of feature extraction. In thismodel, on each time frame, a Gaussian window(we tried window sizes of standard deviationsranging from 2 to 9 pixels but this made littledifference) was moved across the spatial extent ofthe letter to find which group of pixels wouldmaximally decrease the uncertainty (i.e., minimizethe entropy) about the target identity given theinformation that had already been accumulatedbefore. A feature analysis across all letters revealedthat this optimal sequential model did not primar-ily use terminations. Moreover, when similarfeatures were used by the ideal and the humanobservers for a given letter, the order in whichthey were used usually differed. For letter “E”,for example, our optimal model first used themiddle horizontal bar and then moved to thelower one, whereas for the human observers, thisorder was reversed. Overall, then, a straightfor-ward account of the order in which the letterfeatures become useful for normal human readershas yet to be uncovered. Future studies will beneeded to further our understanding of thedynamics of letter identification.

Manuscript received 19 February 2008Revised manuscript received 21 July 2008

Revised manuscript accepted 17 August 2008First published online day month year

REFERENCES

Blais, C., Fiset, D., Arguin, M., Jolicoeur, P., &Gosselin, F. (2008). A routine of visual informationextraction in word recognition. Manuscript inpreparation. Q7

Boles, D. B., & Clifford, J. E. (1989). An upper- andlowercase alphabetic similarity matrix, with derivedgeneration similarity values. Behavior ResearchMethods, Instruments, & Computers, 21, 579–586.

Bouma, H. (1971). Visual recognition of isolated lower-case letters. Vision Research, 11, 459–474.

Brainard, D. H. (1997). The psychophysics toolbox.Spatial Vision, 10, 433–436.

Briggs, R., & Hocevar, D. J. (1975). A new distinctivetheory for upper case letters. The Journal of GeneralPsychology, 93, 87–93.

Chauvin, A.,Worsley, K. J., Schyns, P.G., Arguin,M.,&Gosselin, F. (2005). Accurate statistical tests forsmooth classification images. Journal of Vision, 5,659–667.

Chung, S. T. L., Legge, G. E., & Tjan, B. S. (2002).Spatial-frequency characteristics of letter identifi-cation in central and peripheral vision. VisionResearch, 42, 2137–2152.

Chung, Tjan, & Lin (2008) Q13Eckstein, M. P., & Ahumada, A. J. (Ed.). (2002). Q14

Classification images: A tool to analyze visual strat-egies [Special issue]. Journal of Vision, 2(1), i-i.Retrieved from http://journalofvision.org/2/1/i/,doi:10.1167/2.1.i.

Fiset, D., Blais, C., Ethier-Majcher, C., Arguin, M.,Bub, D. N., & Gosselin, F. (2008). Q4Features foruppercase and lowercase letter identification.Manuscript submitted for publication.

Gervais, M. J., Harvey, L. O., & Roberts, J. O. (1984).Identification confusions among letters of the alpha-bet. Journal of Experimental Psychology: HumanPerception and Performance, 10, 655–666.

Geyer, L. H. (1977). Recognition and confusion of thelowercase alphabet. Perception & Psychophysics, 22,487–490.

Geyer, L. H., & DeWald, C. G. (1973). Q15Feature listsand confusion matrices. Perception and Psychophysics,14, 471–482.

Gibson, B., Lazareva, O. F., Gosselin, F., Schyns, P. G.,& Wasserman, E. A. (2007). Non-accidental proper-ties underlie shape recognition in mammalian andnon-mammalian vision.Current Biology, 17, 336–340.



905

910

915

920

925

930

935

940

945

950

955

960

965

970

975

980

985

990

Gibson, E. J. (1969). Principles of perceptual learning anddevelopment. New York: Meredith.

Gibson, E. J., Gibson, J. J., Pick, A. D., & Osser, H. A.(1962). A developmental study of the discriminationof letter-like forms. Journal of Comparative &Physiological Psychology, 55, 897–908.

Gilmore, G. C., Hersh, H., Caramazza, A., &Griffin, J.(1979). Multidimensional letter similarity derivedfrom recognition errors. Perception andPsychophysics, 25, 425–431.

Ginsburg, A. P. (1980).Q16 Specifying relevant spatialinformation for image evaluation and displaydesign: An explanation of how we see certainobjects. Proceedings of the SID, 21, 219–227.

Gosselin, F., & Schyns, P. G. (2001). Bubbles: A tech-nique to reveal the use of information in recognition.Vision Research, 41, 2261–2271.

Gosselin, F., & Schyns, P. G. (2004). A picture is worththousands of trials: Rendering the use of visual infor-mation from spiking neurons to recognition.Cognitive Science, 28, 141–146.

Holbrock, M. B. (1975).Q17 A comparison of methodsfor measuring the inter-letter similarity betweencapital letters. Perception and Psychophysics, 17,532–536.

Laughery, K. R. (1971).Q18 Computer simulation ofshort-term memory: A component decay model. InG. T. Bower & J. T. Spence (Eds.), The psychologyof learning and motivation: Advances in research andtheory (Vol. 6). New York: Academic Press.

Legge, G. E., Klitz, T. S., & Tjan, B. S. (1997). Mr.Chips: An ideal-observer model of reading.Psychological Review, 104, 524–553.

Legge, G. E., Pelli, D. G., Rubin, G. S., &Schleske, M. M. (1985).Q5 Psychophysics of reading:I. Normal vision. Vision Research, 25, 239–252.

Lindsay, P. H., & Norman, D. A. (1977).Human infor-mation processing: An introduction to psychology.New York: Academic Press.

Loomis, J. M. (1982). Analysis of tactile and visualconfusion matrices. Perception and Psychophysics, 31,41–52.

Majaj, N. J., Pelli, D. G., Kurshan, P., & Palomares, M.(2002). The role of spatial frequency channels inletter identification. Vision Research, 42, 1165–1184.

Matlin, M. W. (2005).Q19 Cognition. USA: John Wiley &Sons.

Mazer, J. A., Vinje, W. E., McDermott, J., Schiller,P. H., & Gallant, J. L. (2002). Spatial frequencyand orientation tuning dynamics in area V1.Proceedings of the National Academy of Sciences, USA,99, 1645–1650.

McCabe, E., Blais, C., & Gosselin, F. (2005). Effectivecategorization of objects, scenes, and faces throughtime. In C. Lefebvre & H. Cohen (Eds.),Handbook of categorization in cognitive science(pp. 767–791). Amsterdam: Elsevier.

Medin, D. L., Ross, B. H., & Markman, A. B. (2005). Q20Cognitive psychology. USA: John Wiley & Sons.

Neisser, U. (1967). Cognitive psychology. New York:Appleton-Century-Crofts.

Parish, D. H., & Sperling, G. (1991). Object spatial fre-quencies, retinal. Vision Research, 31, 1399–1416.

Pelli, D. G. (1997). The Video Toolbox software forvisual psychophysics: Transforming numbers intomovies. Spatial Vision, 10, 437–442.

Pelli, D. G., Burns, C. W., Farell, B., & Q21Moore-Page,D. C. (2006). Feature detection and letter identifi-cation. Vision Research, 46, 4646–4674.

Pelli, D. G., Farell, B., & Moore, D. C. (2003). Theremarkable inefficiency of word recognition.Nature, 423, 752–756.

Rumelhart, D. E., & Siple, P. (1974). Process of recog-nizing tachistoscopically presented words. PsychologicalReview, 81, 99–118.

Solomon & Pelli, 1994 Q6Townsend, J. T. (1971). Alphabetic confusion: A test of

models for individuals. Perception and Psychophysics,9, 449–454.

Van Der Heijden, A. H. C., Malhas, M. S. M., & VanDen Roovaart, B. P. (1984). An empirical interletterconfusion matrix for continuous-line capitals.Perception and Psychophysics, 35, 85–88.

Vinette, C., Gosselin, F., & Schyns, P. G. (2004).Spatio-temporal dynamics of face recognition in aflash: It’s in the eyes! Cognitive Science, 28, 289–301.

Watson, A. B., & Pelli, D. G. (1983). QUEST: ABayesian adaptive psychometric method. Perceptionand Psychophysics, 33, 113–120.

Wolfe, J. M. (1998). What can 1 million trials tell usabout visual search? Psychological Science, 9, 33–39.

Wolfe, J. M., Alvarez, G. A., & Horowitz, T. S. (2000).Attention is fast but volition is slow. Nature,406, 691.


FISET ET AL.

995

1000

1005

1010

1015

1020

1025

1030

1035

1040

1045

1050

1055

1060

1065

1070

1075

1080

PCGN342284Queries

Daniel Fiset, Caroline Blais, Martin Arguin, Karine Tadros, Catherine Ethier-Majcher, Daniel Bub, andFrederic Gosselin

Q1 Short title ok as amended? Running heads may have max. 50 characters.

Q2 Please supply up to 5 keywords.

Q3 Introduction heading removed (APA5: 1.08, p. 16; 3.30, p. 113)

Q4 Fiset, Blais, Ethier-Majcher, Arguin, Bub, & Gosselin: 2008 OK for date read? “In press” in textchanged to 2008, OK or is it now in press? [manuscripts in preparation or submitted need date:“Use the year of the draft you read (not ‘in preparation’) in the text citation” (APA5: 4.16, p. 264)]

Q5 Legge, Pelli, Rubin, & Schleske (1985). Schleske in refs, Scleske in text.

Q6 Solomon & Pelli, 1994. Not in refs.

Q7 Blais, Fiset, Arguin, Jolicoeur, & Gosselin: 2008 OK for date read?

Q8 No text citation to Figure 1. OK as inserted here?

Q9 Please insert full ref. in ref. list.

Q10 No text citation of Figure 3. Ok as inserted here? or where?

Q11 “error bars correspond to 1.96 the standard deviation observed”: Should this be”1.96 of the stan-dard deviation?

Q12 “Figure 4” changed to “Figure 5” here : ok?

Q13 Chung, Tjan & Lin (2008) Not in refs.

Q14 Eckstein & Ahumada (Ed.). (2002). If Internet-only journal, insert date of retrieval. If Epubahead of print, insert this before doi (is this a complete doi?)

Q15 Geyer, & DeWald (1973). Text citation? [APA requires each entry in the reference list to becited in the text (APA5: 4:01, p. 215)]

Q16 Ginsburg (1980). Is this correct journal title? Spell out SID?

Q17 Holbrock (1975). Text citation?

Q18 Laughery (1971). Text citation?

Q19 Matlin (2005). Give town and state instead of USA.

Q20 Medin, Ross, & Markman (2005). Give town and state instead of USA.

Q21 Pelli, Burns, Farell, & Moore-Page (2006). Text citation?

1085

1090

1095

1100

1105

1110

1115

1120

1125

1130

1135

1140

1145

1150

1155

1160

1165

1170

Download - Thespatio-temporaldynamicsofvisualletterrecognition · Thespatio-temporaldynamicsofvisualletterrecognition Daniel Fiset University of Victoria, Victoria, British Columbia, Canada

Top Related