identifying fixations and saccades in eye-tracking protocols

Identifying Fixations and Saccades inEye-Tracking Protocols

Dario D. SalvucciNissan Cambridge Basic Research

Four Cambridge CenterCambridge, MA 02142 USA

+1 617 374 9669

[email protected]

Joseph H. GoldbergDept. of Industrial and Manufacturing Engineering

Pennsylvania State University310 Leonhard Building

University Park, PA 16802 USA+1 814 863 2370

[email protected]

ABSTRACTThe process of fixation identification—separating and labelingfixations and saccades in eye-tracking protocols—is anessential part of eye-movement data analysis and can have adramatic impact on higher-level analyses. However,algorithms for performing fixation identification are oftendescribed informally and rarely compared in a meaningful way.In this paper we propose a taxonomy of fixation identificationalgorithms that classifies algorithms in terms of how theyutilize spatial and temporal information in eye-trackingprotocols. Using this taxonomy, we describe five algorithmsthat are representative of different classes in the taxonomy andare based on commonly employed techniques. We thenevaluate and compare these algorithms with respect to a numberof qualitative characteristics. The results of these comparisonsoffer interesting implications for the use of the variousalgorithms in future work.

KeywordsFixation identification, eye tracking, data analysis algorithms.

1. INTRODUCTIONEye tracking has been gaining in popularity over the pastdecade as a window into observers’ visual and cognitiveprocesses. For instance, researchers have utilized eye trackingto study behavior in such domains as image scanning [e.g.,12], driving [e.g., 11], arithmetic [e.g., 20], analogy [e.g.,17], and reading [e.g., 14]. In these and other domains,researchers typically analyze eye movements in terms offixations (pauses over informative regions of interest) andsaccades (rapid movements between fixations). Commonanalysis metrics include fixation or gaze durations, saccadicvelocities, saccadic amplitudes, and various transition-basedparameters between fixations and/or regions of interest.

The analysis of fixations and saccades requires some form offixation identification (or simply identification)—that is, thetranslation from raw eye-movement data points to fixationlocations (and implicitly the saccades between them) on thevisual display. Fixation identification significantly reducesthe size and complexity of the eye-movement protocol,removing raw saccade data points and collapsing raw fixationpoints into a single representative tuple. This reduction isuseful for at least two reasons. First, little or no visualprocessing can be achieved during a saccade [6], and thus theactual paths traveled during saccades are typically irrelevant formany research applications. Second, smaller eye movementsthat occur during fixations, such as tremors, drifts, and flicks[1, 4], often mean little in higher-level analyses. Thus,fixation identification is a convenient method of minimizingthe complexity of eye-tracking data while retaining its mostessential characteristics for the purposes of understandingcognitive and visual processing behavior.

Fixation identification is an inherently statistical descriptionof observed eye movement behaviors. While it is generallyagreed upon that visual and cognitive processing do occurduring fixations [e.g., 9], it is less clear exactly when fixationsstart and when they end. Thus, regardless of the precision andflexibility associated with identification algorithms, theidentification problem is still a subjective process. Thereforeone efficient way to validate these algorithms is to compareresultant fixations to an observer’s subjective impressions.

Though widely employed, fixation identification and itsvarious implementations have often been given short shrift inthe eye-movement literature, particularly in literatureconcerning the interaction of eye movements and higher-levelcognition. However, identification is often a critical aspect ofeye-movement data analysis that can have significant effectson later analyses. For instance, poorly defined identificationalgorithms may produce too many or too few fixations, or maybe overly sensitive to outlier data points, thus biasinginterpretation. Karsh and Breitenbach [10] provided a rigorousdemonstration of how different identification algorithms canproduce vastly different interpretations even when analyzingidentical protocols. Thus, good identification algorithmsensure valid fixation and saccade locations and durations whichin turn influence inferences such as processing complexity andvisual search paths.

Salvucci, D. D., & Goldberg, J. H. (2000). Identifying fixations and saccades in eye-tracking protocols. In Proceedings of the EyeTracking Research and Applications Symposium (pp. 71-78). New York: ACM Press.

In addition to the frequent underspecification of identificationalgorithms, little work has been done in evaluating andcomparing different possible algorithms. Researchers andpractitioners often have minimal information to guide theirdecision of which algorithm to use in particular situations. Notsurprisingly, this problem leads to the somewhat haphazardapplication of different algorithms, making it difficult tocompare results derived by different methods of identification.

In this paper we address these problems by proposing a noveltaxonomy of fixation identification algorithms and evaluatingrepresentative algorithms in the context of this taxonomy.The taxonomy classifies algorithms with respect to five spatialand temporal criteria. The spatial criteria divide algorithms interms of their use of velocity, dispersion, and area-of-interestinformation. The temporal criteria divide algorithms in termsof their use of duration information and their local adaptivity.The taxonomy provides a meaningful labeling andclassification of existing algorithms so that they may be moreeasily compared in a systematic way to guide the choice ofalgorithms for particular applications.

To demonstrate the usefulness of the taxonomy, we identify anddescribe five algorithms that are representative of differentclasses in the taxonomy. We first provide rigorousdescriptions of these algorithms in “pseudocode” form to bestillustrate the similarities and differences between thealgorithms. We then evaluate and compare the methods withrespect to several qualitative characteristics includingalgorithm speed, accuracy, robustness, parameter space, andease of implementation. We conclude with a discussion ofimplications and recommendations for how the variousmethods can be best utilized for future applications.

2. A TAXONOMY OF FIXATIONIDENTIFICATION ALGORITHMS

Our taxonomy classifies fixation identification algorithmswith respect to spatial and temporal characteristics, assummarized in Table 1. In constructing this taxonomy, weattempted to identify a minimal set of criteria that would bestcapture the differences between common existing algorithms.While we could propose a more complex taxonomy that couldpotentially account for finer distinctions and/or hybridalgorithms, the proposed basic taxonomy provides a usefulcharacterization of the primary types of identificationalgorithms. The taxonomy will thus serve as a good startingpoint for comparing and evaluating existing identificationalgorithms, as we discuss in the following two sections.

For spatial characteristics, we identify three criteria thatdistinguish three primary types of algorithms: velocity-based,dispersion-based, and area-based. Velocity-based algorithmsemphasize the velocity information in the eye-trackingprotocols, taking advantage of the fact that fixation pointshave low velocities and saccade points have high velocities.(Note that, assuming a constant sampling rate, velocities aresimply distances between sampled points and thus we canignore the temporal component implicit in velocities.)Dispersion-based algorithms emphasize the dispersion (i.e.,spread distance) of fixation points, under the assumption that

fixation points generally occur near one another. Area-basedalgorithms identify points within given areas of interest(AOIs) that represent relevant visual targets. Thesealgorithms, unlike the others described here, provide bothlower-level identification and higher-level assignment offixations to AOIs [see 15, 16]. Because fixations can also beused as inputs to AOI algorithms, these can also representhigher levels of attentional focus on a display. These dwelltimes can be considered ‘macro-fixations’, in that theyorganize fixations into a larger picture.

For temporal characteristics, we include two criteria: whetherthe algorithm uses duration information, and whether thealgorithm is locally adaptive. The use of duration informationis guided by the fact that fixations are rarely less than 100 msand often in the range of 200-400 ms. The incorporation oflocal adaptivity allows the interpretation of a given data pointto be influenced by the interpretation of temporally adjacentpoints; this is useful, for instance, to compensate fordifferences between ‘steady-eyed’ individuals and those whoshow large and frequent eye movements.

3. REPRESENTATIVE ALGORITHMSTo enable us to rigorously compare the classes of algorithms inour taxonomy, it is convenient to formalize sample algorithmsthat serve to represent the essential ideas embodied by eachclass of algorithms. This section describes five algorithms—I-VT, I-HMM, I-DT, I-MST, and I-AOI—that extract theimportant aspects of numerous existing algorithms and expresstheir basic techniques as simply as possible. Table 1 showshow the representative algorithms map onto the proposedtaxonomy. Of course, there are many other possiblealgorithms that we could include in this analysis, such ashybrid algorithms with multiple heuristics; however, due tospace constraints, we limit this exposition to five algorithmswith fairly different approaches and implementations.

Table 1 : Taxonomy o f f i x a t i o n ident i f i ca t iona l g o r i t h m s , s a m p l e c i t a t i o n s f o r t h e i r u s e , a n dthe sample algorithms presented here.

Representative Algorithms

Criteria

I-V

T

I-H

MM

I-D

T

I-M

ST

I-A

OI

Spa

tial

Velocity-based

Dispersion-based

Area-based

X X

X X

X

Tem

pora

l

Duration sensitive

Locally adaptive X

X

X X

X

3.1 Velocity-Based Algorithms3.1.1 Velocity-Threshold Identification (I-VT)Velocity-threshold fixation identification (I-VT) is thesimplest of the identification methods to understand andimplement. I-VT is a velocity-based method that separatesfixation and saccade points based on their point-to-pointvelocities [e.g., 5, 18]. The velocity profiles of saccadic eyemovements show essentially two distributions of velocities:low velocities for fixations (i.e., <100 deg/sec), and highvelocities (i.e., >300 deg/sec) for saccades. This aspect ofsaccadic eye movements makes velocity-based discriminationfairly straightforward and robust. While it is possible to createa locally adaptive velocity-based fixation identificationalgorithm, velocity profiles have strong physical andphysiological underpinnings, and thus static criteria areusually sufficient here.

I-VT begins by calculating point-to-point velocities for eachpoint in the protocol. Each velocity is computed as thedistance between the current point and the next (or previous)point. I-VT then classifies each point as a fixation or saccadepoint based on a simple velocity threshold: if the point’svelocity is below threshold, it becomes a fixation point,otherwise it becomes a saccade point. The process thencollapses consecutive fixation points into fixation groups anddiscards saccade points. Finally, I-VT translates each fixationgroup to a representation <x ,y ,t,d> using the centroid (i.e.,center of mass) of the points as x and y , the time of the firstpoint as t, and the duration of the points as d. Pseudocode forthe I-VT method is shown in Table 2.

Table 2: Pseudocode for the I-VT algorithm.

I-VT (protocol, velocity threshold)

Calculate point-to-point velocities for eachpoint in the protocol

Label each point below velocity threshold asa fixation point, otherwise as a saccadepoint

Collapse consecutive fixation points intofixation groups, removing saccade points

Map each fixation group to a fixation at thecentroid of its points

Return fixations

I-VT requires the specification of one parameter, the velocitythreshold. If angular velocities can be computed (i.e., thedistance from eye to visual stimuli is known), the point-to-point velocity threshold can be approximated from areasonable angular velocity threshold [6]; for instance, Sen andMegaw [18] used a threshold of 20 degrees/second. However,when only point-to-point velocities are known, an appropriatevalue for the velocity threshold may need to be inferred basedon aspects of data collection (e.g., sampling frequency) alongwith some exploratory data analysis.

3.1.2 HMM Identification (I-HMM)Hidden Markov model fixation identification (I-HMM) usesprobabilistic analysis to determine the most likelyidentifications for a given protocol [15, 16]. Hidden Markovmodels (HMMs) are probabilistic finite state machines thathave been employed extensively in the fields of speech andhandwriting recognition [see 13]. I-HMM uses a two-stateHMM in which the states represent the velocity distributionsfor saccade and fixation points. This probabilisticrepresentation helps I-HMM perform more robustidentification than a fixed-threshold method such as I-VT.

The crux of I-HMM is the two-state HMM shown in Figure 1.The HMM includes two sets of probabilities: observation andtransition probabilities. The observation probabilities foreach state (the small v distributions) represent the distributionof expected velocities in that state. The first state representssaccade points, and thus contains a distribution centered aroundhigher velocities; the second state represents fixation points,and thus contains a distribution centered around lowervelocities. The transition probabilities for each state (thearrows exiting the states) represent the likelihood of remainingin the state or making a transition to another state. Thetransition probabilities in Figure 1 show a large likelihood ofremaining in each state (.95) and a small likelihood or makinga transition (.05). Thus, the HMM provides a probabilisticrepresentation of the observations (i.e., velocities) generatedduring saccadic eye movements.

Figure 1 : Sample two-state HMM. The f irs ts ta te represents h igher-ve loc i ty saccade po ints ;the second state represents l o w e r - v e l o c i t yf i x a t i o n p o i n t s .

Given the two-state HMM, I-HMM determines the most likelyidentification of each protocol point through a process ofdecoding. HMM decoding finds an assignment of protocolpoints to states that maximizes the probability of the protocolgiven the HMM, using dynamic programming [13] to find thisoptimal assignment efficiently. The assignment associateseach point with a state, thus providing an identification of eachpoint as a fixation or saccade point. Like I-VT, I-HMM thencollapses consecutive fixation points into groups and outputsfixations at the centroid of the groups. Table 3 presentspseudocode for this I-HMM algorithm.

The parameters of I-HMM comprise the observation andtransition parameters in the two-state HMM: two observationprobability parameters (the mean and variance of thedistribution) and two transition probabilities for each state, fora total of eight parameters. While this parameter space is

certainly more complex than that of I-VT, the parameters of I-HMM can be learned through a procedure called reestimation[13]. Given a training set (i.e., a set of protocols) reestimationlearns the values of HMM parameters that maximize theprobability of the training set given the HMM. Thus, withprotocol data from a particular eye-tracking setup, reestimationcan provide parameter values that, once estimated, can be usedfor any protocols collected in that setup.

3.2 Dispersion-Based Algorithms3.2.1 Dispersion-Threshold Identification (I-DT)In contrast to the velocity-based identification of I-VT and I-HMM, dispersion-threshold identification (I-DT) utilizes thefact that fixation points, because of their low velocity, tend tocluster closely together. I-DT identifies fixations as groups ofconsecutive points within a particular dispersion, or maximumseparation [e.g., 19, 21]. Because fixations typically have aduration of at least 100 ms, dispersion-based identificationtechniques often incorporate a minimum duration threshold of

100-200 ms [21] to help alleviate equipment variability. Thefollowing I-DT algorithm is based on Widdel’s [21] datareduction algorithm.

The I-DT algorithm uses a moving window that spansconsecutive data points checking for potential fixations. Themoving window begins at the start of the protocol and initiallyspans a minimum number of points, determined by the givenduration threshold and sampling frequency. I-DT then checksthe dispersion of the points in the window by summing thedifferences between the points’ maximum and minimum x and yvalues; in other words, dispersion D = [max(x ) – min(x )] +[max(y ) – min(y )]. Note that alternative dispersion metricscould be based upon spatial variance or area of samples. If thedispersion is above the dispersion threshold, the window doesnot represent a fixation, and the window moves one point tothe right. If the dispersion is below the dispersion threshold,the window represents a fixation. In this case, the window isexpanded (to the right) until the window’s dispersion is abovethreshold. The final window is registered as a fixation at thecentroid of the window points with the given onset time andduration. This process continues with window moving to theright until the end of the protocol is reached. Table 4 includespseudocode for the I-DT algorithm.

We can note that this characterization of fixations uses thecentroid and diameter. A circular area is usually assumed, andthe mean distance from each sample to the fixation centroidprovides an estimate of the radius. Also, dispersion-basedalgorithms are sometimes used to locate clusters withinminimum spanning tree network representations (see thefollowing section). The graph forms an efficient framework forrapid search of large sample sets.

The I-DT algorithm requires two parameters, the dispersionthreshold and the duration threshold. Like the velocitythreshold for I-VT, the dispersion threshold can be set toinclude 1/2° to 1° of visual angle if the distance from eye toscreen is known. Otherwise, the dispersion threshold can beestimated from exploratory analysis of the data. The durationthreshold is typically set to a value between 100 and 200 ms[21], depending on task processing demands.

Note that this characterization of I-DT is conceptually differentfrom some dispersion-based methods that center on clusteringalgorithms. For instance, the leader algorithm and the k-meansalgorithm are two such methods [8]. However, most of thesealgorithms require an initial estimate of the number of clustersin the protocol, making them less useful for eye-movementdata, where the number of fixations is typically unknown.

3.2.2 MST Identification (I-MST)MST identification (I-MST) is based on minimum spanningtrees (MSTs) — that is, a tree connecting a set of points suchthat the total length of the tree’s line segments is minimized.MSTs can provide a highly flexible and controllablerepresentation for dispersion-based fixation identification [7].A two-step approach is necessary, first requiring theconstruction of the MST followed by a search of the MST.Construction uses Prim’s algorithm [2]. There is one and onlyone MST for a set of points. The advantage of an MST data

Table 3: Pseudocode for the I-HMM algorithm.

I-HMM (protocol, HMM)

Calculate point-to-point velocities for eachpoint in the protocol

Decode velocities with two-state HMM toidentify points as fixation or saccade points

Collapse consecutive fixation points intofixation groups, removing saccade points


Return fixations

Table 4: Pseudocode for the I-DT algorithm.

I-DT (protocol, dispersion threshold,duration threshold)

While there are still points

Initialize window over first points tocover the duration threshold

If dispersion of window points <=threshold

Add additional points to the windowuntil dispersion > threshold

Note a fixation at the centroid of thewindow points

Remove window points from points

Else

Remove first point from points

Return fixations

representation lies in the degree of control, flexibility, andlocal adaptation for dispersion analysis, as well as improvedsubsequent characterizations of defined fixations.

Fixation identification requires traversing the already definedMST, as described in Table 5. A depth-first search is made todetermine the maximum depth of interconnectivity at eachpoint. Branching depths below a defined setpoint signallocations near the edge of the MST that are not appropriatecandidates for separating fixations. If the edges connected toeach endpoint exceed the defined minimum branching depth,then the network of edges connected to each endpoint isgathered into an associated edge length distribution. The meanµ and standard deviation σ of edge lengths provide a locallyadaptive comparison for separation of fixations. Separationcan occur based upon comparison of the edge underconsideration of both µ and σ of neighbor edge lengths. In thisway, a framework is achieved for controlling where fixationsmay occur in an MST and for determining how local adaptivityaffects fixation decisions.

The MST format allows additional characterization parametersfor fixations. For example, the MST length is defined by thelongest path through the graph. Critical paths can be definedby areas with minimal branching structure. These and otherparameters allow an estimate of the shape and possiblydirection of fixations; these go beyond simpler centroid andsize characterizations.

3.3 Area-based Algorithms3.3.1 Area-of-Interest Identification (I-AOI)The four previous identification methods can identify fixationsat any location in the visual field. In contrast, area-of-interestfixation identification (I-AOI) identifies only fixations thatoccur within specified target areas [e.g., 3]. The target areas arerectangular regions of interest that represent units ofinformation in the visual field. These target areas, generallyused in later analyses like tracing, keep identified fixationsclose to relevant targets. I-AOI also utilizes a durationthreshold to help distinguish fixations in target areas frompassing saccades in those areas.

I-AOI begins by associating data points with target areas: itlabels points within a target area as a fixation point for thattarget, and labels points outside of all target areas as saccades.I-AOI then collapses consecutive fixation points for the sametarget into fixation groups, discarding saccade points. Finally,it removes fixation groups that fall below a given durationthreshold and transforms each fixation group into a fixationtuple. The pseudocode for this algorithm appears in Table 6.

As explained earlier, I-AOI can provide a larger picture thanfixations alone. Like a fixation, dwell time within an area hasa starting time and an ending time. However, since fixationscan serve as the input data into AOI determination, the timebetween these dwells do not describe saccadic behavior.Rather, there could be multiple saccades and fixationsinterspersed between AOIs. While AOI-based dwell-timealgorithms are not fixation algorithms per se, they are valuableas a concept to help explain higher-level collections offixations organized about visual targets and areas.

4. EVALUATION AND COMPARISONTo evaluate and compare the various identification methods, wenow consider each method with respect to severalcharacteristics: interpretation speed, accuracy, robustness, easeof implementation, and parameter setting. Table 7 shows asummary of the described methods according to thesecharacteristics. To illustrate some of the issues discussed, weutilize a sample eye-movement protocol from a task in whichstudents encoded four equation values and performed somesimple computation [see 15, 16], as shown in Figure 2. Eachpoint represents the student’s current point-of-regard sampledat a rate of 60 Hz. Points that represent fixations, asinterpreted by the various identification algorithms, are drawnlarger than those that represent saccades. Also, points thatoccur earlier in the protocol time sequence are drawn darker thanthose that occur later. The fixations are numbered to facilitateunderstanding of the protocol time sequence.

Table 5: Pseudocode for the I-MST algorithm.

I-MST (protocol, edge ratio, edge sd)

Construct MST from protocol data points usingPrim’s algorithm

Find the maximum branching depth for each MSTpoint using a depth-first search

Identify saccades as edges whose distancesexceed predefined criteria

Define the parametric properties (µ, σ) oflocal edges, identifying saccades when anedge length exceeds a defined ratio

Identify fixations as clusters of points notseparated by saccades

Return fixations

Table 6: Pseudocode for the I-AOI algorithm.

I-AOI (protocol, duration threshold,target areas)

Label each point as a fixation point for thetarget area in which it lies, or as a saccadepoint if none

Collapse consecutive fixation points for thesame target into fixation groups, removingsaccade points

Remove fixation groups that do not span theminimum duration threshold


Return fixations

4.1 Velocity-Based MethodsI-VT is straightforward to implement, runs very efficiently, andcan easily run in real time. However, it sometimes encountersproblems when point velocities hover near threshold becauseof eye-tracker noise or time-averaged data. Specifically, whenpoint velocities are near threshold (e.g., midway betweensaccades and fixations), the strict threshold can result in“blips” in identification—fixation or saccade groups with onlyone or a few consecutive points. The protocol in Figure 2(a)illustrates this problem: fixations 3-6 are interpreted as fourseparate fixations, whereas it seems more likely that there areonly one or two actual fixations represented in these datapoints. Similarly, fixations 11-13 seem to represent a singlefixation. The problem is especially prevalent when analyzingtime-averaged data, such as those in the figure. Researchershave alleviated the problem in the past by aggregatingconsecutive fixations over a single target into gazes on thattarget [9], as shown in Figure 2(b). Other work requiredminimum durations for either fixations or saccades; forinstance, Sen and Megaw [18] required that saccade pointsremain above a velocity threshold for at least 10 ms.

I-HMM employs a probabilistic model rather than a fixedvelocity threshold and utilizes the sequential information inthe protocol. Thus, I-HMM allows for more freedom inidentifying points and provides more robust analysis than I-VT, especially in the presence of significant noise. Figure 2(c)illustrates the same protocol as in Figure 1 analyzed using I-HMM; the problematic fixations that I-VT separated intomultiple fixations (see Figure 2(a)) are grouped into moreplausible fixations in Figure 2(c). Another benefit of I-HMMis that the two-state HMM can be embedded into a largercognitive model for even more robust interpretation of eyemovements [15, 16]. In addition, I-HMM, like I-VT, runsefficiently in linear time [see 15, 16] and can also run in realtime. The primary disadvantage of I-HMM is the difficulty ofimplementing the reestimation procedure, which is bothcomplex and tedious. However, if no reestimation ofparameters is needed, the I-HMM algorithm alone (includingHMM decoding) is fairly straightforward to implement.

4.2 Dispersion-Based MethodsI-DT is a linear-time, potentially real-time algorithm thatproduces robust identification results. For the sample protocolin Figure 2(a), it generates an almost identical analysis as I-HMM (as shown in Figure 2(c)), avoiding the “blip” problembecause of its duration threshold. The primary disadvantage ofI-DT is the use of two parameters that are highlyinterdependent; for instance, a small dispersion threshold witha large duration threshold may not result in any identifiedfixations. Thus, I-DT requires the most careful parametersetting but provides good results for a fairly simple algorithm.

I-MST has a somewhat different flavor than the otheralgorithms in the sense that it forms clusters of fixation pointswithout using sequential information in the protocols. Thus, itis more difficult to compare with the other algorithms. I-MSTallows for control of parameter settings that define a saccade byvarying the relative importance of the edge length ratio andmean/variance tests for clustering. Its speed is exponential inthat it becomes exponentially slower as points are added: itrequires multiple traversals over the MST for each point, one toestablish local edge distribution characteristics and one to testeach potential cut edge. It is also more difficult to make I-MSTa real-time algorithm because of its global analysis of existingdata points. However, I-MST also seems to provide additionalmeans of characterizing fixations beyond spatial location andsize; for instance, the MST diameter provides the longest MSTpath within each fixation. Also, characteristics of the MST,such as the ratio of path length to width, can provideinformation on the shape of the defined fixations. Branchingdepth provides an additional means of assessing fixation size.

4.3 Area-Based MethodsI-AOI runs efficiently in linear time or real time. However, itsuffers from a somewhat serious problem: the inclusion ofmany saccade points in identified fixations. Because any pointwithin a target area classifies as a fixation point, saccadepoints leading into a fixation are often included in the finalfixation. This problem can result in deceptively long durationsfor identified fixations. An even more problematic situation

Table 7: Summary of identi f icat ion methods.

Method Accuracy Speed Robustness Impl. Ease Parameters

Velocity Threshold (I-VT) √ √√ × √√ 1

Hidden Markov Model (I-HMM) √√ √ √√ √ / × a 8 / 0 a

Dispersion Threshold (I-DT) √√ √ √√ √ 2

Minimum Spanning Tree (I-MST) √ × √√ × 2

Area-of-Interest (I-AOI) × √ √ √ 1+b

Key: √√ = very good, √ = good, × = not as good.a I-HMM has 8 parameters in its two-state HMM that can be an be learned through reestimation. Without reestimation, I-HMM has 8parameters but is simpler to implement; with reestimation, I-HMM effectively has no parameters but is more difficult to implement.b I-AOI has 1 parameter but also requires specification of target areas.

arises when long saccades through large target areas areidentified as fixations, especially when the data include someform of time averaging. Figure 2(d) illustrates these problems:fixation 1 includes points that likely represent a saccade (theupper grouping of points), whereas fixation 4 fails to identifyapparent fixation points because they lie outside the targetareas used. Overall, I-AOI can provide reasonable results [e.g.,3] for certain aggregate analyses but is generally lesspreferable than the other, more robust algorithms.

5. SUMMARY AND CONCLUSIONSTo summarize our qualitative analysis, I-HMM and I-DT provideaccurate and robust fixation identification by incorporatingsequential information to aid in interpretation. I-MST also

provides robust identification but runs slower than all otheralgorithms. I-VT has the simplest algorithm as thus thesmallest computational overhead; however, it can experiencesevere “blip” effects when analyzing at the level of fixationsrather than gazes. I-AOI performs rather poorly on all frontsfor the purpose of identification and is best not used.

These results offer several implications for future use of theseand related algorithms. First, velocity-based and dispersion-based algorithms both fare well and provide approximatelyequivalent performance. However, area-based algorithms aretoo restrictive and can generate deceptive results that bias lateranalyses. Second, the use of temporal information can greatlyfacilitate fixation identification of protocols. The three

(a)

(b)

(c)

(d)

Figure 2 : F ixa t ion ident i f i ca t ion us ing (a) I-VT, (b) I-VT with f i x a t i o n s co l lapsed i n t o g a z e s ,(c) I -HMM or I-DT, and (d) I -AOI. Eye-movement protocols are taken from an equat ion-solving task inw h i c h a s t u d e n t e n c o d e s t h e f o u r e q u a t i o n v a l u e s a n d c o m p u t e s a r e s u l t [ s e e 1 5 , 1 6 ] . L a r g e r p o i n t srepresent f i x a t i o n s whi le smaller p o i n t s represent saccades (as interpreted by the ident i f i ca t ionalgori thm). Darker points occur earl ier whi le l ighter points occur later in the protocol .

algorithms that utilize this information in a locally adaptiveway, namely I-HMM, I-DT, and I-MST, generate robustinterpretations even in the presence of eye-tracking equipmentnoise. Third, in support of work such as Karsh and Breitenbach[10], the choice of identification algorithms can dramaticallyaffect the resulting identified fixations. By describing theirchosen algorithm in terms of the proposed taxonomy,researchers can better communicate their analysis techniques toothers and thus provide more standardized and understandableresults.

As the next step in evaluating and comparing fixationidentification algorithms, we hope to generate morequantitative measures for comparison and examine how thevarious algorithms affect later analyses in the context ofdifferent domain applications. For instance, some preliminarywork has shown that a subset of the algorithms described herecan affect the interpretation of eye-tracking protocols withrespect to the predictions of a cognitive process model [15,16]. This increased understanding of possible identificationalgorithms will help to facilitate the use and analysis of eye-tracking data for both scientific research and the developmentof eye-based systems and applications.

REFERENCES[1] Alpern, M. (1962). Type of movement. In H. Davson

(Ed.), The Eye , Vol. 3. (pp. 63-151). New York: AcademicPress.

[2] Camerini, P. M., Galbiati, G., & Maffioli, F. (1988).Algorithms for finding optimum trees: Description, useand evaluation. Annals of Operations Research, 13 , 265-397.

[3] DenBuurman, R., Boersma, T., & Gerrisen, J. F. (1981).Eye movements and the perceptual span in reading.Reading Research Quarterly, 16 , 227-235.

[4] Ditchburn, R. W. (1980). The function of small saccades.Vision Research, 20 , 271-272.

[5] Erkelens, C. J., & Vogels, I. M. L. C. (1995). The initialdirection and landing position of saccades. In J. M.Findlay, R. Walker, & R. W. Kentridge (Eds.), EyeMovement Research: Mechanisms, Processes, andApplications (pp. 133-144). New York: Elsevier.

[6] Fuchs, A. F. (1971). The saccadic system. In P. Bach-y-Rita, C. C. Collins, & J. E. Hyde (Eds.), The Control ofEye Movements (pp. 343-362). New York: AcademicPress.

[7] Goldberg, J. H., & Schryver, J. C. (1995). Eye-gaze-contingent control of the computer interface:Methodology and example for zoom detection. BehaviorResearch Methods, Instruments, & Computers, 27 , 338-350.

[8] Hartigan, J. A. (1975). Clustering algorithms. NewYork: John Wiley & Sons.

[9] Just, M. A., & Carpenter, P. A. (1984). Using eyefixations to study reading comprehension. In D. E. Kieras

& M. A. Just (Eds.), New Methods in ReadingComprehension Research (pp. 151-182). Hillsdale, NJ:Erlbaum.

[10] Karsh, R., & Breitenbach, F. W. (1983). Looking atlooking: The amorphous fixation measure. In R. Groner,C. Menz, D. F. Fisher, & R. A. Monty (Eds.), EyeMovements and Psychological Functions: InternationalViews (pp. 53-64). Hillsdale, NJ: Erlbaum.

[11] Land, M. F., & Lee, D. N. (1994). Where we look when westeer. Nature, 369, 742-744.

[12] Noton, D., & Stark, L. (1971). Scanpaths in saccadic eyemovements while viewing and recognizing patterns.Vision Research, 11 , 929-942.

[13] Rabiner, L. R. (1989). A tutorial on hidden Markovmodels and selected applications in speech recognition.Proceedings of the IEEE , 77 , 257-286.

[14] Rayner, K. (1995). Eye movements and cognitiveprocesses in reading, visual search, and scene perception.In J. M. Findlay, R. Walker, & R. W. Kentridge (Eds.), EyeMovement Research: Mechanisms, Processes, andApplications (pp. 3-21). New York: Elsevier.

[15] Salvucci, D. D. (1999). Mapping eye movements tocognitive processes. Doctoral Dissertation, Departmentof Computer Science, Carnegie Mellon University.

[16] Salvucci, D. D., & Anderson, J. R. (1998). Tracing eyemovement protocols with cognitive process models. InProceedings of the Twentieth Annual Conference of theCognitive Science Society (pp. 923-928). Hillsdale, NJ:Erlbaum.

[17] Salvucci, D. D., & Anderson, J. R. (in press). Integratinganalogical mapping and general problem solving: Thepath-mapping theory. Cognitive Science.

[18] Sen, T., & Megaw, T. (1984). The effects of task variablesand prolonged performance on saccadic eye movementparameters. In A. G. Gale & F. Johnson (Eds.),Theoretical and Applied Aspects of Eye MovementResearch (pp. 103-111). Amsterdam: Elsevier.

[19] Stark, L., & Ellis, S. R. (1981). Scanpath revisited:Cognitive models of direct active looking. In D. F.Fisher, R. A. Monty, & J. W. Senders (Eds.), EyeMovements: Cognition and Visual Perception (pp. 193-226). Hillsdale, NJ: Erlbaum.

[20] Suppes, P. (1990). Eye-movement models for arithmeticand reading performance. In E. Kowler (Ed.), EyeMovements and their Role in Visual and CognitiveProcesses (pp. 455-477). New York: Elsevier.

[21] Widdel, H. (1984). Operational problems in analysingeye movements. In A. G. Gale & F. Johnson (Eds.),Theoretical and Applied Aspects of Eye MovementResearch (pp. 21-29). New York: Elsevier.

identifying fixations and saccades in eye-tracking protocols

Education