ieee transactions on professional communication, … · visuals in content or procedures (e.g.,...

26
IEEE Proof Web Version IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008 1 Information Modalities for Procedural Instructions: The Influence of Text, Pictures, and Film Clips on Learning and Executing RSI Exercises —CHARLOTTE VAN HOOIJDONK AND EMIEL KRAHMER Abstract—Much of the empirical research on the effectiveness of different instructional designs has focused on declarative tasks, where a learner acquires knowledge about a certain topic. It is unclear to what extent findings for learning declarative tasks (which are not consistent on all aspects) carry over to learning procedural tasks, where a learner acquires a certain skill. In this paper, we describe an experiment studying a specific kind of procedural instructions, namely exercises for the prevention of repetitive strain injury (RSI), taking information modality (text versus picture versus film clip) and difficulty degree of the exercises (easy versus difficult) into account. In the experiment, participants had to learn RSI exercises and were asked to execute them. The results showed that an instruction in a picture lead to the shortest learning times followed by an instruction in a film clip. An instruction in text led to the longest learning times. For the amount of practicing the exercises during the learning phase, it was found that the participants in the film clip condition hardly engaged in practicing the exercises during the learning phase. The participants in the picture condition engaged in a moderate amount of practicing of the exercises during the learning phase. The participants in the text condition engaged in the most practicing during the learning phase. The results concerning the execution times showed that an instruction in a picture led to the lowest execution times followed by an instruction in a film clip. The instruction in text led to longest execution times. Finally, for the amount of correctly executed exercises, it was found that learning from a film clip led to the highest learning performance, both for easy and for difficult exercises. Learning from an instruction in text led to a fairly good learning performance, both for easy and difficult exercises. Learning from a picture led to a good learning performance for the easy exercises, but the performance dropped for the difficult exercises. Index Terms—Document design, information modality, procedural learning. The emergence of computer-based learning has led to new possibilities for presenting instructions and to a renewed interest in the effects of different information modalities. Instructions can be given in plain text, but also in the form of static visuals (e.g., diagrams, pictures) or in dynamic visuals (e.g., animations, film clips). Naturally, this raises the question which (combinations of) modalities are best for which learning task. This question has been addressed in a large number of recent studies (e.g., [1]–[6], among many others). Each of the aforementioned presentation modes has its own unique characteristics, and its own advantages and disadvantages from an instructional perspective. Language is the basic form of human communication, and one of its main strengths is that it is expressive and explicit (both for concrete and for abstract subject matter). An additional advantage of its written form (as opposed Manuscript received October 6, 2006; revised February 20, 2007. The authors are with the Department of Communication & Information Sciences, Faculty of Humanities, Tilburg University, 5000 LE Tilburg, The Netherlands (email: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. IEEE 10.1109/TPC.2007.2000054 to the spoken variant) is that it is not transient: written sentences remain visible on paper and can be re-read if desired, while spoken sentences are “gone” once they have been uttered. But, reading a text requires considerable skill and effort, and, moreover, text primarily facilitates linear information processing. A difference between text and pictures is their symbol system. The symbol system used for text is abstract/linguistic, while the one used for pictures is sensory. This affects the type of information that they can convey. For example, pictures can directly communicate perceptual information (e.g., spatial orientation and location). In other words, pictures have a lower “articulatory distance” [7]. Moreover, pictures are not constrained by the linear structure of text, and are therefore arguably more efficient at representing nonlinear relations among objects. In text these relationships often remain implicit [8]. The focus of many recent studies has been on the instruction effects of dynamic visuals, presumably because such instructions only recently have become a real possibility due to the increased computing power of standard multimedia PCs. A number of reasons have been suggested as to why an advantage of dynamic visuals over other presentation possibilities such as text and static visuals might be expected. For instance, it has 0361-1434/$25.00 © 2008 IEEE

Upload: others

Post on 01-Dec-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

IEEE

Pro

of

Web

Ver

sion

IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008 1

Information Modalities for Procedural Instructions:The Influence of Text, Pictures, and Film Clipson Learning and Executing RSI Exercises—CHARLOTTE VAN HOOIJDONK AND EMIEL KRAHMER

Abstract—Much of the empirical research on the effectiveness of different instructional designs has focused ondeclarative tasks, where a learner acquires knowledge about a certain topic. It is unclear to what extent findings forlearning declarative tasks (which are not consistent on all aspects) carry over to learning procedural tasks, wherea learner acquires a certain skill. In this paper, we describe an experiment studying a specific kind of proceduralinstructions, namely exercises for the prevention of repetitive strain injury (RSI), taking information modality (textversus picture versus film clip) and difficulty degree of the exercises (easy versus difficult) into account. In theexperiment, participants had to learn RSI exercises and were asked to execute them. The results showed that aninstruction in a picture lead to the shortest learning times followed by an instruction in a film clip. An instruction intext led to the longest learning times. For the amount of practicing the exercises during the learning phase, it wasfound that the participants in the film clip condition hardly engaged in practicing the exercises during the learningphase. The participants in the picture condition engaged in a moderate amount of practicing of the exercises duringthe learning phase. The participants in the text condition engaged in the most practicing during the learning phase.The results concerning the execution times showed that an instruction in a picture led to the lowest executiontimes followed by an instruction in a film clip. The instruction in text led to longest execution times. Finally, forthe amount of correctly executed exercises, it was found that learning from a film clip led to the highest learningperformance, both for easy and for difficult exercises. Learning from an instruction in text led to a fairly good learningperformance, both for easy and difficult exercises. Learning from a picture led to a good learning performance forthe easy exercises, but the performance dropped for the difficult exercises.

Index Terms—Document design, information modality, procedural learning.

The emergence of computer-based learning hasled to new possibilities for presenting instructionsand to a renewed interest in the effects of differentinformation modalities. Instructions can be givenin plain text, but also in the form of static visuals(e.g., diagrams, pictures) or in dynamic visuals(e.g., animations, film clips). Naturally, this raisesthe question which (combinations of) modalities arebest for which learning task. This question hasbeen addressed in a large number of recent studies(e.g., [1]–[6], among many others).

Each of the aforementioned presentation modeshas its own unique characteristics, and itsown advantages and disadvantages from aninstructional perspective. Language is the basicform of human communication, and one of its mainstrengths is that it is expressive and explicit (bothfor concrete and for abstract subject matter). Anadditional advantage of its written form (as opposed

Manuscript received October 6, 2006; revised February 20,2007.The authors are with the Department of Communication& Information Sciences, Faculty of Humanities, TilburgUniversity, 5000 LE Tilburg, The Netherlands (email:[email protected]; [email protected]).Color versions of one or more of the figures in this paper areavailable online at http://ieeexplore.ieee.org.

IEEE 10.1109/TPC.2007.2000054

to the spoken variant) is that it is not transient:written sentences remain visible on paper andcan be re-read if desired, while spoken sentencesare “gone” once they have been uttered. But,reading a text requires considerable skill and effort,and, moreover, text primarily facilitates linearinformation processing.

A difference between text and pictures is theirsymbol system. The symbol system used for text isabstract/linguistic, while the one used for picturesis sensory. This affects the type of information thatthey can convey. For example, pictures can directlycommunicate perceptual information (e.g., spatialorientation and location). In other words, pictureshave a lower “articulatory distance” [7]. Moreover,pictures are not constrained by the linear structureof text, and are therefore arguably more efficient atrepresenting nonlinear relations among objects. Intext these relationships often remain implicit [8].

The focus of many recent studies has been on theinstruction effects of dynamic visuals, presumablybecause such instructions only recently havebecome a real possibility due to the increasedcomputing power of standard multimedia PCs.A number of reasons have been suggested as towhy an advantage of dynamic visuals over otherpresentation possibilities such as text and staticvisuals might be expected. For instance, it has

0361-1434/$25.00 © 2008 IEEE

IEEE

Pro

of

Web

Ver

sion

2 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

been argued that dynamic visuals are beneficialfor learning since they offer a “complete model” ofa learning task (e.g., [1], [9]); in other words, theyare “informationally complete” [10, p. 249]. Whenstatic visuals or text are used, learners themselveswill have to construct a mental representation ofthe temporal aspects in the instruction. It has beenargued that dynamic illustrations offer a betterrepresentation of these temporal aspects, therebyreducing the level of abstraction and supporting adeeper level of understanding than static visualswould do [9]. Arguably, this should facilitatelearning since it would reduce learning times,require less practice, and result in fewer errors.

A substantial number of studies have tried todemonstrate this presumed learning benefit, butwith mixed results (e.g., [1], [5], [11], and [12]). Twogroups of explanations for these mixed results havebeen offered: one group of explanations stressesthe disadvantages of dynamic visuals; anothergroup points to methodological problems in thesestudies. The first explanation claims that dynamicvisuals have a fixed duration which viewerssimply have to watch. This set time period maylead to inherently longer learning times [11]. In asimilar vein, Lewalter [1] suggested that dynamicvisuals may take more time to process than otherpresentation modes; the information in dynamicvisuals changes continuously, and as a resultlearners could be overwhelmed with the amount ofinformation. Alternatively, it has been suggestedthat dynamic visuals are processed somewhatsuperficially; they require little cognitive effort, asthey can be watched rather passively (e.g., [10]and [13]). The second explanation, by contrast,argues that the mixed findings so far are primarilydue to methodological problems in the variousexperimental studies. For instance, to be reallybeneficial, dynamics should have some added value[14], which is not always the case. When it comesto temporal sequencing or indicating direction ofmotion, it has been claimed that arrows in staticvisuals may be just as effective as dynamic visuals(e.g., [15]). Tversky, Morrison, and Bétrancourt [11]point out that in some comparative studies there isa lack of equivalence between dynamic and staticvisuals in content or procedures (e.g., because thedynamic visuals convey more information or involveinteractivity). Some researchers have argued thateven when dynamic visuals do not lead to improvedlearning, they are more attractive and motivatingthan other instruction forms and should bepreferred for that reason (e.g., [11], [16], and [17]).However, this “subjective satisfaction” [18, p. 33] of

various instruction modes is often not addressed,and when it is the results are equivocal (e.g., [19]).A further complicating factor is that the learningdomain itself might also have an influence on therelative benefits of information modalities [14]. Itseems reasonable to assume that different learningdomains benefit from different kinds of instructions(cf., also [20]). In the field of learning psychology,many studies have shown that dynamic visualsare used as learning instructions for descriptive,“scientific” explanations (often describing causesand effects), such as mechanical and electronicsystems (e.g., brakes, pumps, generators [21],[22]; electronic circuit systems [23]); mathematicalconcepts (e.g., algebra [24]); or complex naturalphenomena (e.g., electricity [25]; lightning [26];gravitational lensing [1]; meteorological changes[2]). Arguably, some of these scientific learningdomains lend themselves better for dynamicvisualization than others. Moreover, what holdsfor descriptive tasks (such as those above), maynot apply to procedural ones, such as bandaginga hand [5], folding origami models [27], or tyingnautical knots [28]. These procedural tasks differfrom descriptive ones in a number of respects. Notonly is the nature of the task different (proceduresare inherently more linear—one step followinganother), but the learning goal is also different (thefocus is not only on understanding, but also onacquiring certain capabilities or skills) [29]. Oneof the factors that might affect the processing ofprocedural information is the design format [29].Procedural information is often presented in atext and/or picture format (e.g., route directions,maintenance instructions, assembly instructions).While pictures may help users to form a mentalmodel of the procedure [30], [31], the combinationof text plus picture may not always be helpful. Forexample, a user’s attention has to switch betweenthe information presented in text and picture,leading to the so-called split attention effect [32].Therefore, document designers have to understandthe strengths and limitations of both text andpictures when designing procedural information[8]. Both the type of picture (e.g., overview versuspartial view [33], line drawing versus photo [5],object-centered versus body-centered [34], etc.)as well as the type of textual instruction (e.g.,user-centered versus object-centered versusenvironment-centered [35]) may influence auser’s performance of the procedure. To furthercomplicate the situation, it may well be that besidesvariation between domains there is also variationwithin domains. Arguably, some learning tasks ina given domain are easier than others, and this

IEEE

Pro

of

Web

Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 3

may have an influence on the relative contributionof various instruction formats for those tasks.

In this paper, we report on a study of the effects oftask difficulty and information modality (comparingdynamic visuals with static visuals and text) onlearning a specific class of procedural tasks,namely exercises aiming at the prevention ofrepetitive strain injury (RSI). RSI is a general termfor disorders that are caused by repetitious use ofhands, arms, and shoulders, often as a result ofprolonged computer terminal work [36]. Recently,the term CANS (short for complaints of arm, neckand/or shoulders) was introduced as an alternativeterm, but here we will stick to the more familiarRSI terminology. It is generally assumed thattaking regular breaks during computer work incombination with exercises is beneficial for theprevention of RSI (e.g., [37]–[39]). RSI exercisesoffer a new and understudied learning domain,with various interesting properties. Generally,these exercises involve little or no abstraction, donot consist of a complicated sequence of actions,and are relatively short. Moreover, the exercisesare highly recognizable (almost everybody has twohands). It is interesting to observe that currentRSI information websites offer a large variety ofprevention exercises, in many different presentationformats (see Fig. 1 for a representative selection).These differences raise the natural question of

what is the effectiveness of the various presentationformats. Note also that there is substantialvariation in the difficulty level of existing RSIexercises, so that the other factor of interest(variation in task difficulty) can be modelled in afairly straightforward way in this domain.

The potential influence of both presentationmodality and task difficulty on learningperformance can be motivated from COGNITIVE

LOAD THEORY (e.g., [40]–[43]), a theory which aimsto develop optimal instructional designs whileconsidering the limitations of the human mind.Cognitive load theory builds on the broadly acceptedassumption that the mind combines a short-term(or working) memory of very limited capacity (whereall conscious activity and processing of informationoccurs [44], [45]) and a long-term memory witha virtually unlimited capacity [42]. According tocognitive load theory, learning amounts to theconstruction of new (or modification of existing)schemata [46], which are subsequently stored inlong-term memory. Since the capacity of workingmemory is severely limited, the cognitive load oflearners should be kept at a minimum duringlearning. In the current version of the theory [42],[43], three kinds of load are distinguished: INTRINSIC

load, caused by the structure and intrinsic natureof the learning task; EXTRANEOUS load, causedby the manner of presentation and its influence

Fig. 1. Three different websites using different presentation formats to illustrate RSI prevention exercises.

IEEE

Pro

of

Web

Ver

sion

4 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

on working memory; and GERMANE load, causedby learners’ efforts to process and comprehendlearning material. The sum of these three kinds ofload should not exceed working memory capacityin order to avoid cognitive overload. As arguedabove, the intrinsic load may vary both betweenand within task domains. Since the intrinsic loadof a particular learning task is fixed, instructiondesign can only influence the extraneous and thegermane load, and, obviously, when the intrinsicload of a particular task is high, there is relativelyless room for the extraneous and the germane load.

Germane load is a “positive” form of load, sinceit encourages learners to engage in cognitiveprocessing that may lead to improved schemataconstruction [42], [47]. This suggests that optimalinstructions are those which minimize extraneousand maximize germane load, while simultaneouslyavoiding overload. A complicating factor is thatthe distinction between extraneous and germanecognitive load, although intuitively plausible,cannot reliably be measured [47]. In general,measuring cognitive load in realistic learningsituations is not straightforward [48]. While somerecent attempts have been made to measurecognitive load directly (e.g., the dual task approachadvocated by Brünken, Plass, and Leutner [49]),many studies rely on indirect measures such asbehavioural or learning outcome measures (e.g.,[47] and [49]). Although these measures only relateto cognitive load indirectly, they do assess learningbehaviour directly, which is sufficient for ourcurrent purposes. In this study, we concentrateon learning times, amount of practicing duringlearning, execution times, and execution errors.

Arguably, the different information modalitiesof interest (text, picture, film clip) have differentloading potentials, which may influenceperformance on one or more of these measures.Of all three modalities, it seems likely that textimposes the highest load: It requires more mentaleffort to read a text than to watch a picture or afilm clip, hence it seems likely that learning fromtext takes longer than learning from pictures orfilm clips. Potentially, an additional complicationfor learning RSI prevention exercises from text isthat schemata construction may be more involvedthan for pictures and film clips. In the text case,learners have to “translate” the textual instructionsto manual gestures. Notice that this is a concreteinstance of Glenberg’s Indexical Hypothesis, statingthat readers associate words and phrases withobjects and actions in “the real world,” whichshould facilitate understanding (e.g., [50]–[52]). An

instruction in the form of a film clip and (probablyto a lesser extent) a picture depicts the hand andarm movements the learner should make, whereasthe learner has to infer these gestures when theyare presented in text. Hence, it is hypothesized thatlearners will practice more often while learning fromtext than while learning from visual presentations.An interesting question is whether the load imposedby text is only extraneous or also partly germane.It might be that the extra effort required to learnfrom text leads to good learning results (shortexecution times, few execution errors), especiallywhen the intrinsic load is low (so that overload canbe avoided).

Pictures arguably impose the lowest load of thethree presentation modalities; viewing a staticpicture requires little mental effort. Since a picturecaptures the essential information of a procedure, itis to be expected that learning times for pictures arerelatively short. However, due to their static nature,pictures offer little information about the temporalstructure of a procedure, and for more complicatedexercises (i.e., exercises with a higher intrinsic load)this may hamper schemata construction and mightresult in suboptimal execution performance.

To what extent film clips impose cognitive loadis uncertain. On the one hand, it can be arguedthat they may induce load, since the film clipscontinuously change and learners have to processthis information, which reduces the cognitiveresources available for germane load. But theymay also lessen cognitive load by relieving thelearner in the translation process, which isrequired when reading text. The main strengthof film clips might well be their “informationalcompleteness”—learners do not have to infer theexact sequence of movements from text or froma single snapshot because the entire sequence ofactions is visualized, which arguably facilitatesschemata construction. This “completeness”suggests that learners will not practice muchduring learning. Whether this will also result in fewexecution errors is not obvious; it may be that thelarge amount of external support relieves learnersof cognitive load demands that they would beable to fulfil, but which might prevent them fromperforming beneficial cognitive actions for learning.

The findings of this study will not only betheoretically relevant, but may also informmore practical work on multimodal informationpresentation. An example of such work is amedical question-answering (QA) system, which iscurrently developed within the IMOGEN (Interactive

IEEE

Pro

of

Web

Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 5

Multimodal Output GENeration) project that isembedded in the IMIX research programme. IMIX(Interactive Multimodal Information eXtraction)is a research programme in the field of Dutchspeech and language technology. A QA system isan automatic system that can answer a user’squestion posed in natural languages (e.g., Whatdoes RSI stand for?) with an answer formulated innatural language (e.g., RSI stands for RepetitiveStrain Injury). Today, QA systems are not onlyexpected to give answers to simple questions,like What does RSI stand for? but also to morecomplex questions, like How should I organize myworkspace in order to prevent RSI? or What is a goodexercise to prevent RSI in my hands? The answers tothese questions might be more informative if theycontained a picture or an animation, rather thanan extensive textual answer [53]. In the IMOGENproject, different aspects of multimodal informationpresentation are studied in order to improve theoutput quality of QA systems.

To find out what the actual strengths andweaknesses of the various information modalitiesare, we describe an experiment in whichparticipants have to learn and execute 20 RSIprevention exercises in two degrees of difficulty.We measure the following features: the influence ofpresenting an instruction in text; picture or filmclips on learning times; the amount of practicingduring learning; execution times; and number ofexecution errors. Besides these objective measures,participants are also asked for their subjectivesatisfaction.

METHOD

Participants Participants were 30 young adults,between 18 and 30 years of age. Of the participants,15 were male and 15 were female.

Design The experiment had a 3 (informationmodality) 2 (difficulty degree) factorial design,with information modality (dynamic visual [filmclip], static visual [picture], text) used as abetween-participants variable, and difficultydegree (easy, difficult) as a within-participantsvariable, and with learning times, amount ofpracticing during learning, execution times, andnumber of correctly executed exercises were usedas dependent variables. The participants wererandomly assigned to an experimental condition.

Stimuli A total of 20 RSI prevention exercises werechosen from websites on RSI prevention and RSIinjury prevention software (e.g., http://web.mit.

edu/atic/www/disabilities/rsi/exercises.html andhttp://www.workpace.com). The chosen exerciseswere all exercises to prevent RSI in hands and arms.Of the 20 exercises, 10 were assumed to be easy toperform, and 10 were assumed to be difficult. Thecriterion for a difficult exercise was that it shouldbe either a complex symmetrical movement or anasymmetrical movement. COMPLEX SYMMETRICAL

MOVEMENTS were classified as movements consistingof at least three sequential atomic movements, inthe course of which both arms and hands makethe same movements. ASYMMETRICAL MOVEMENTS

were classified as movements in which theparticipant should make a different movement witheach arm or hand. EASY EXERCISES were neitherasymmetrical nor complex movements. Fig. 2shows representative examples of an easy and adifficult exercise.

Fig. 2. Two typical exercises: one easy and one difficult.

Note that this operationalization of easy anddifficult tasks is based on the relative complexity ofthe sequence of motoric movements. To find out towhat extent this objective criterion coincided withsubjective perception of task difficulty, a pre-testwas carried out in which nine participants wereasked to classify the 20 exercises (presented intext plus picture format, and in random order).They were instructed to make two piles, the firstconsisting of the 10 exercises they consideredeasiest to perform, and the second pile consistingof the 10 exercises they considered more difficultto perform. It turned out that of the 10 exercisesclassified as easy, 7 were indeed perceived as easy

IEEE

Pro

of

Web

Ver

sion

6 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

by the vast majority ( 75%) of the participants,while the remaining 3 were perceived as difficult bya majority of the participants (presumably becausethese consisted of gestures that are motoricallysimple but not commonly used and hence withwhich participants may have been less familiar).Of the 10 exercises classified as difficult, 9 wereindeed perceived as such by the vast majority( 75%), while the remaining 1 was perceived aseasy by most participants (interestingly, this wasan exercise that was motorically complex butfamiliar to most participants). In sum, for 16 ofthe 20 exercises, there was a clear and consistentcorrelation between the objective and the perceiveddifficulty degree.

The 20 RSI prevention exercises were presentedin three different formats. In the text condition,the exercises were explained to the participantsin a purely textual format. The total amount ofwords did not differ between the 10 easy and 10difficult exercises: both counted 268 words in total.Thus, the average length was almost 27 words perexercise. Since some exercises are a few wordsshorter and others a few words longer, we will onlyreport on mean total averages for the 10 exercisesin each condition. Because natural language isoften ambiguous, the text exercises were checkedfor their comprehensibility in a second pre-testwith three participants (different from those in thefirst pre-test). It turned out that a few exercises ledto misunderstandings and so these exercises werereformulated. The new versions were presented tothe same three participants, and they agreed thatthe reformulations solved the misunderstandings.

In the picture condition, each of the 20 exerciseswas displayed in a single photograph. Digitalcamera pictures were taken of a female who didthe exercises. She wore black clothing and themovements were shot against a black backgroundso that only her hands were visible in the picture.The photo depicted the STROKE of the movement,with the phase of the movement as it unfoldsin time containing the “semantic content” ofthe movement [54]. To indicate the direction ofmovement, arrows were added to the pictures of9 difficult and 2 easy exercises. The size of thepictures was 1536 1014 pixels.

For the film clip condition, the same female in anidentical surrounding was filmed with a digitalfilm camera (25 frames per second). The totalamount of frames did not differ between easy

and difficult exercises: both counted 1097 frames(average film length was thus 4.39 seconds). Again,since some film clips are somewhat shorter, andothers somewhat longer, we will report on meantotal averages for the 10 easy and the 10 difficultexercises.

The exercises were presented to the participantsvia a website: one website for each condition. Theexperiment was run on a multimedia PC, with a17-inch color monitor. In all three conditions, thepresentations of the RSI exercises appeared at thecentre of the display. On the video website, themovements were shown in a film clip with a startand a stop button and a slider. The participantshad the opportunity to view the film as often as theydesired, but not much use was made of this option.Exercises were presented in one of two randomorders to control for possible learning effects.

Procedure The experiment consisted of threeparts: a practice session, the central part inwhich the 20 RSI exercises were presented, anda questionnaire. The participants completed eachpart individually. Each participant was invited toan experimental laboratory, where he or she tooka seat behind the computer. Participants were toldthat they would receive 20 exercises which theyhad to learn and perform to assess to what extentthey suffered from RSI. In addition, they were led tobelieve that their hand and arm movements werefilmed because a doctor would later look at therecordings of the participants to determine to whatextent they suffered from RSI. The procedure of theexperiment is depicted in Fig. 3.

After the participants had read the instructions,they could click on the hyperlink start, and the firsttrial exercise would appear. Depending on theirexperimental condition, the participants read orviewed the trial exercise until they thought that theycould properly execute the exercise. Subsequently,they clicked the hyperlink next at the bottom ofthe page. A new webpage appeared with the text“execute trial exercise” at the centre of the page.During the execution of an exercise, participantscould not see the instruction. When they hadexecuted the exercise, they clicked the hyperlinknext exercise at the bottom of the page. Aftercompleting the second trial exercise, participantswere asked if they had any questions regarding theexperimental procedure. If not, the participantscould start the actual experiment, proceeding in thesame way as they did in the trial session. Duringthe experiment, there was no further interactionbetween participants and the experiment leader.

IEEE

Pro

of

Web

Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 7

Fig. 3. Procedure of the experiment.

After the participants completed the 20 experimentalexercises, they received a questionnaire. In thisquestionnaire, the subjective satisfaction regardingthe content and structure of the website as wellas the comprehensibility and the attractiveness ofthe exercises were measured. The questionnaireconsisted of 16 bi-polar seven-point semanticdifferentials addressing structure and content ofthe websites, as well as comprehensibility andattractiveness of the exercises (see Appendix A).The presentation order of the adjectives wasrandomized. For processing, the positive adjectiveswere mapped to one (very positive) and the negativeadjectives were mapped to seven (very negative).

The participants were debriefed at the end of theexperiment.

Data Processing The following data was collected:learning times; number of practiced exercisesduring the learning time; execution times; numberof correctly executed exercises; and the results ofthe questionnaire.

The time it took the participants to learnand execute the exercises was computedfrom the data of the log program ProxyPlus(http://www.proxyplus.com). This programregistered the times associated with participants’mouse clicks during the experiment. The timeperiod between clicking the hyperlink next whichpreceded the instruction of an exercise and thehyperlink next which followed the instructionof an exercise was defined as the LEARNING TIME

(see Fig. 3). The time period between clicking thehyperlink next which followed the instruction of anexercise and the hyperlink next that preceded anew instruction of an exercise was defined as theEXECUTION TIME (see Fig. 3).

A digital video camera was used to recordthe movements the participants made duringthe experiment. These video recordings of theparticipants’ hands and arms were used to assesswhether the participants practiced the exercisesduring the learning period and to assess theirperformances while executing the RSI exercises.Occasionally, a participant performed an exercisein a correct but not intended way. These cases werecounted as correctly executed exercises. One judgedid the assessment; scoring was easy, and the fewdifficult cases that did arise were resolved afterdiscussion.

Tests for significance were performed using arepeated measures analysis of variance (ANOVA),with a significance threshold of 0.05. For post-hoctests the Bonferroni method was used. Theinternal consistency of the four item sets of thequestionnaire was measured using Cronbach’salpha.

RESULTS

Learning Times Table I summarizes the results.First consider the learning time. We found thatthe difficulty degree had an effect on the amountof time to learn the exercises ( ,

). Overall, the participants needed moretime to learn the difficult exercises than the easyones. Also, the information presentation had

IEEE

Pro

of

Web

Ver

sion

8 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

an effect on the learning time ( ,). Participants in the picture condition

required the shortest learning times, participantsin the text condition had the longest learningtimes, with learning times from film clips falling inbetween those two. Post-hoc tests indicated thatthe instruction in text differed significantly from theinstruction in a picture . There was nosignificant difference between the instruction in textand instruction in a film clip . Also, theinstruction in a picture did not differ significantlyfrom the instruction in a film clip . Nosignificant interactions between difficulty degreeand information modality were found.

Amount of Practicing During the Learning PeriodTable I also reveals that the participants practicedthe difficult exercises more often than the easy onesduring the learning period, and this difference wasfound to be statistically significant ( ,

). Also, information presentation had aneffect on the amount of practicing during thelearning period ( , ). In thefilm clip condition, participants almost neverpracticed; in the picture condition, they practicedfor about a fifth of the exercises, while in the textcondition, participants practiced about half of theexercises. Post-hoc tests showed that the differencebetween the instruction in text and the instructionin a picture was not significant . Textdiffered significantly from the instruction in a filmclip . There was no significant differencebetween the instruction in a picture and a film clip

, presumably because of the relativelyhigh standard deviation. No significant interactioneffects were found.

Execution Times The difficulty degree had amain effect on the amount of time needed toperform the exercises ( , ).The participants required more time to execute

the difficult exercises than the easy ones. Therewas also a main effect of information modality onthe execution times ( , ).Participants in the text condition had much longerexecution times than those in the picture andfilm clip conditions. Participants in the picturecondition needed somewhat less time for executionthan the participants in the film clip condition,but the differences are relatively small and thestandard deviations are relatively high. Post-hoctests indicated that there was a significantdifference between the instruction in text andthe instruction in a picture . Also, theinstruction in text significantly differed from theinstruction in a film . There was nosignificant difference between the instruction in afilm clip and in a picture . There was nosignificant interaction between difficulty degree andinformation modality.

Amount of Correctly Executed Exercises Amain effect of difficulty degree on the performancewas found ( , ). As canbe seen in Table I, the participants executed onaverage 9.1 easy exercises correctly, as opposedto 8.3 difficult ones. There was also a maineffect of information modality: the participantswho watched the film clip executed the mostmovements correctly ( , ). Atwo-way interaction between difficulty degree andinformation modality was found ( ,

). This interaction effect can be explained asfollows: For the instruction in text ( ,

) and for the instruction in a film clip( , ) no significant differenceswere found in the amount of correctly executedeasy and difficult exercises. However, for theinstruction in a picture ( , ),a significant difference was found in the amountof correctly executed easy and difficult exercises.

TABLE ILEARNING PERFORMANCE RESULTS PER DEGREE OF DIFFICULTY AND INFORMATION MODALITY

IEEE

Pro

of

Web

Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 9

The participants in the picture condition correctlyexecuted fewer of the difficult RSI exercises thanthe easy exercises (respectively, 9.2 versus 7.6).

Subjective Satisfaction The internal consistencyon the four item sets in the questionnaire wasmeasured using Cronbach’s alpha. Followingstandard practice, we qualify alpha as adequate ifits value was higher than 0.70. For the structure ofthe website the alpha was 0.78, and for the contentof the website it was 0.56. In Table II, we reportseparately on the items concerning the content ofthe website. The alpha for the comprehensibilityof the exercises was 0.82, and 0.83 for theirattractiveness.

TABLE IIMEAN RESULTS FOR THE SUBJECTIVE SATISFACTION

QUESTIONNAIRE PER INFORMATION MODALITY (FROM 1 =“VERY SATISFIED” TO 7 = “VERY DISSATISFIED”

Table II gives an overview of the results of thesubjective satisfaction questionnaire. Informationmodality had no effect on the subjective satisfactionregarding the website and the exercises. No effectswere found between the three conditions for thewebsite and for the exercises .

CONCLUSION AND DISCUSSION

In this paper, we investigated the learnability ofRSI prevention exercises in different presentationformats. What effective ways of learning suchexercises is an important research question withclear practical ramifications, in view of the growing

awareness of the importance of RSI prevention andthe bewildering array of presentation formats forRSI prevention exercises currently being employedon the internet and in RSI prevention software.

An experiment was performed looking at theeffect of offering RSI prevention exercises in threedifferent formats (film clips, pictures, or text)focusing on learning time, the amount of practicing,execution time, learning results, and subjectivesatisfaction. To model variation within a domain,20 RSI prevention exercises were selected fromdifferent sources in such a way that 10 exerciseswere motorically easy (symmetric and simple) and10 exercises were more difficult (asymmetric orcomplex). A pre-test with nine participants revealedthat there was a strong connection betweenobjective and perceived difficulty degree.

As a manipulation check, we tested whethereasy exercises were “easier” to perform thanthe assumed difficult ones. The results showedthat irrespective of presentation format, the easyexercises were associated with shorter learningtimes, less practicing, shorter execution times, andless execution errors than the difficult exercises.It is worth repeating that the summed length (interms of frames and number of words) of the 10easy exercises was exactly the same as the summedlength of the 10 difficult ones, so that theseeffects can only be attributed to the differences incomplexity (intrinsic cognitive load).

Of the three presentation formats underinvestigation, text was expected to impose thehighest load. Overall, text indeed led to the longestlearning times, which can in part be ascribed tothe fact that it takes more mental effort to read atext than to look at a picture or watch a film clip.Furthermore, during the learning phase, people notonly read the text, but also engage in a substantialamount of practicing, which takes time as well.People in the text condition by far did the mostpracticing, which is consistent with the IndexicalHypothesis [50]–[52]: To foster understanding,participants “translated” the textual instructionsinto actual movements during learning for manyexercises. Participants must thus engage in fairlydeep processing of the textual instructions, butarguably this is to some extent a form of germaneload: The deep processing and practicing appearto be beneficial for learning. Contrary to whatone might expect, the relatively long learningphase does not lead to shorter execution times.Still, it does lead to a surprisingly good learningperformance. For the easy exercises, participants

IEEE

Pro

of

Web

Ver

sion

10 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

in the text condition made a few more errors thanparticipants in the other two conditions, but for thedifficult exercises, performance was even slightlybetter than for pictures, which suggests that thegermane activities pay off.

Pictures were expected to impose the lowest load.The average learning times were indeed lowest inthe picture condition, as were the average executiontimes. The pictures did not offer a complete modelof the exercise, because only the stroke of themovement was depicted, with arrows indicatingmotion where applicable. The expectation wasthat people generally would be able to derive thecomplete exercise on the basis of this information.This might explain why a moderate amount ofpracticing took place in this condition (morethan for film clips). For easy exercises, learningfrom pictures led to a good performance. In fact,participants made as few errors for these exercisesas participants in the film clip condition. But theperformance dropped for the difficult exercises,where as many errors were made as in the textcondition.

Concerning the load of film clips, the followingtwo contrasting hypotheses were mentioned: Theymight induce load because a learner has to processthe continuously changing images, but they mightalso reduce load, freeing the learner by simplypresenting a complete, physical model of the taskto be carried out. It was found that film clips ledto medium length learning times (between pictureand text). In part, this can be attributed to thefact that watching a clip takes a fixed length.But it is interesting to see that difficult exercisesrequired longer learning times than easy ones, eventhough they were of the same average duration(and it is not the case that learners practiceddifficult exercises more often), which was probablydue to the higher average intrinsic load of thedifficult tasks. There was virtually no practicingin the film clip condition, as expected, since theclips offer an informationally complete model ofthe task. Contrary to expectation, the executiontimes were not the shortest, which suggests thatparticipants still had to engage in cognitive activityduring the execution phase. Film clips did lead toa consistently high learning performance, bothfor easy and for difficult exercises. Hence, despitepracticing during learning, learners do constructthe necessary scheme based on germane cognitiveprocesses.

It is interesting to observe that none of thepresentation formats appeared to be superior on all

dimensions of interest; each had some disadvantage(less efficient for learning, relatively many errors,etc.). In view of this, it is perhaps not surprisingthat the subjective satisfaction questionnaire didnot reveal any significant differences. The factthat no single modality outperformed the otherson all dimensions implies that there is no single,straightforward design recommendation on howto present information (e.g., in the context of amedical QA system). The goal of the informationpresentation influences the type of presentation.For example, if the amount of practicing isconsidered to be the most important factor, the RSIexercises are best described in text. However, ifquick leaning is most important, the RSI exercisesare best illustrated with a picture. Finally, whenthe goal of the information presentation is agood overall execution, the RSI exercises are bestvisualized with a film clip.

RSI prevention exercises offer a new and ecologicallyvalid learning situation with a number of interestingproperties. These exercises are quite brief andarguably relatively easy to learn. A downside of thischoice is that, with respect to learning performance(number of errors), there may be a ceiling effect,in that easy exercises are learned very well for allthree modalities. It would be interesting to redo theexperiment with more complex RSI related tasks,and see whether this change would lead to moredifferentiation between the different modalitieswhere errors are concerned.

While we did our best to make sure that theexercises in the three conditions offered comparableinformation, following the recommendations ofTversky et al. [11], it turned out that this wasnot always straightforward. While a static picturecombined with an arrow indicating direction ormotion can be very informative, it does not makethe entire intended movement explicit as a dynamicpicture does. In the former case, but not in the lattercase, the learner has to infer the full movement,which may lead to errors, especially for the difficultexercises as we have seen. Still, it is interestingto see that the efficiency of pictures (learning andexecution times) is higher than that of the othertwo modalities, and leads to nearly optimal resultsfor easy exercises. This finding indicates that aparticularly efficient method for illustrating morecomplex exercises might be via a series of picturesdepicting key stages of the procedure. One wouldexpect that this could lead to both high efficiencyand good learning results, for easy as well as fordifficult exercises. It would also be interesting tostudy the effect of series of pictures versus single

IEEE

Pro

of

Web

Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 11

pictures, or film clips in multimodal instructions(e.g., the use of visuals plus speech). We intend toexplore these possibilities in future research.

In a somewhat similar vein, we found that certainRSI exercises seem to be inherently easier torepresent than others, and especially that thisease-of-representation may vary across differentpresentation formats. Some movements can beconcisely described in language because the entiremovement has been “coded” in a fixed expression(e.g., make fists), whereas other movements can be

rather cumbersome to describe. Also, expressinghow a particular movement “feels” (e.g., spreadyour fingers until a mild stretch between the fingersis felt) is obviously easier in language than instatic or dynamic visuals. For such exercises, atextual presentation might have an added valueover other presentation formats. It would beinteresting to systematically vary the presence orabsence of linguistic short cuts (describing complexmovements in a few words) and investigate howthis variance influences the effectiveness of textualinstructions.

APPENDIX

SUBJECTIVE SATISFACTION QUESTIONNAIRE

(ENGLISH TRANSLATION OF DUTCH ORIGINALS)

(1) The structure of the website is: (a)Orderly/Disorderly (b) Clear/Unclear (c) Easyto access/Not easy to access (d) Easy to seethrough/Not easy to see through

(2) The content of the website is: (a)Informative/Uninformative (b) Clear/Unclear(c) Comprehensible/Incomprehensible (d)Understandable/Unintelligible

(3) The comprehensibility of the exercisesis: (a) Easy/Difficult (b) Simple/Hard (c)Clear/Unclear (d) Unambiguous/Ambiguous

(4) The attractiveness of the exercises is: (a)Varied/Unvaried (b) Interesting/Uninteresting(c) Appealing/ Unappealing (d) Fascinating/Tiresome

ACKNOWLEDGMENT

A preliminary analysis of the experiment hasappeared in Dutch: C. van Hooijdonk andE. Krahmer, “De invloed van unimodale enmultimodale instructies op de effectiviteitvan RSI-preventieoefeningen,” Tijdschrift voorTaalbeheersing, vol. 28, no. 2, pp. 73–87, 2006.

The current research is part of the InteractiveMultimodal Information eXtraction (IMIX) project,which is funded by the Netherlands Organisationfor Scientific Research (NWO). Thanks are dueto P. Barkhuysen and L. van der Laar for theirhelp in setting up the experiment, and to F. Maesand N. Ummelen for their useful comments on an

earlier version of this paper. Moreover, we wouldlike to thank the two anonymous reviewers for theirsuggestions.

REFERENCES

[1] D. Lewalter, “Cognitive strategies for learningfrom static and dynamic visuals,” Learning andInstruction, vol. 13, no. 2, pp. 177–189, 2003.

[2] R. Lowe, “Interrogation of a dynamic visualizationduring learning,” Learning and Instruction, vol. 14,no. 3, pp. 257–274, 2004.

[3] R. Mayer, Multimedia Learning. New York:Cambridge Univ. Press, 2001.

[4] R. Mayer, “The promise of multimedia learning:Using the same instructional design methodsacross different media,” Learning and Instruction,vol. 13, no. 2, pp. 125–139, 2003.

[5] I. Michas and D. Berry, “Learning a proceduraltask: Effectiveness of multimedia presentations,”Appl. Cognitive Psychology, vol. 14, no. 6, pp.555–575, 2000.

[6] R. Ploetzner and R. Lowe, “Dynamic visualisationsand learning,” Learning and Instruction, vol. 14,no. 3, pp. 235–240, 2004.

[7] T. Williams and D. Harkus, “Editing visualmedia,” IEEE Trans. Prof. Commun., vol. 41, no. 1,pp. 33–45, Mar., 1998.

[8] J. Larkin and H. Simon, “Why a diagram is(sometimes) worth a thousand words,” CognitiveSci., vol. 11, no. 1, pp. 65–99, 1987.

[9] O. Park and R. Hopkins, “Instructional conditionsfor using dynamic visual displays: A review,” J.Instructional Sci., vol. 21, no. 6, pp. 427–449, 1993.

[10] W. Schnotz, J. Böckheler, and H. Grzondziel,“Individual and co-operative learning withinteractive animated pictures,” Eur. J. PsychologyEdu., vol. 14, no. 2, pp. 245–265, 1999.

[11] B. Tversky, J. Morrison, and M. Bétrancourt,“Animation: Can it facilitate?,” Int. J.Human-Comput. Stud., vol. 57, no. 4, pp. 247–262,2002.

IEEE

Pro

of

Web

Ver

sion

12 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

[12] M. Bétrancourt and B. Tversky, “Effects ofcomputer animation on user’s performance: Areview,” Le Travail Humain, vol. 63, no. 4, pp.311–329, 2000.

[13] W. Schnotz and T. Rasch, “Enabling, facilitating,and inhibiting effects of animations in multimedialearning: Why reduction of cognitive load can havenegative results on learning,” Educational Technol.Res. Devel., vol. 53, no. 3, pp. 47–58, 2005.

[14] R. Weiss, D. Knowlton, and G. Morrison,“Principles for using animation in computer-basedinstruction: Theoretical heuristics for effectivedesign,” Comput. Human Behavior, vol. 18, no. 4,pp. 465–477, 2002.

[15] B. Tversky, J. Zacks, P. Lee, and J. Heiser,“Lines, blobs, crosses, and arrows: Diagrammaticcommunication with schematic figures,” in Theoryand Application of Diagrams, M. Anderson, P.Cheng, and V. Haarslev, Eds. Berlin: Springer,2000, pp. 221–230.

[16] E. Perez and M. White, “Student evaluationof motivational and learning attributes ofmicrocomputer software,” J. Comput.-BasedInstruction, vol. 12, no. 2, pp. 39–43, 1985.

[17] L. Rieber, “Animation, incidental learning, andcontinuing motivation,” J. Educational Psychology,vol. 83, no. 3, pp. 318–328, 1991.

[18] J. Nielsen, Usability Engineering. New York:Academic, 1993.

[19] J. Pane, A. Corbett, and B. John, “Assessingdynamics in computer-based instruction,” in Proc.ACM Conf. Human Factors in Computing Systems,1996, pp. 797–804.

[20] M. Hegarty, “Dynamic visualizations and learning:Getting to the difficult questions,” Learning andInstruction, vol. 14, no. 3, pp. 343–351, 2004.

[21] R. Mayer, “Systematic thinking fostered byillustrations in scientific text,” J. EducationalPsychology, vol. 81, no. 2, pp. 240–246, 1989.

[22] R. Mayer and J. Gallini, “When is an illustrationworth a thousand words?,” J. EducationalPsychology, vol. 82, no. 4, pp. 715–726, 1990.

[23] O. Park and S. Gittelman, “Dynamiccharacteristics of mental models and dynamicvisual displays,” J. Instructional Sci., vol. 23, no.5–6, pp. 303–320, 1995.

[24] S. Reed, “Effects of computer graphics onimproving estimates to algebra word problems,”J. Educational Psychology, vol. 77, no. 3, pp.285–298, 1985.

[25] P. Cheng, “Electrifying diagrams for learning:Principles for complex representational systems,”Cognitive Sci., vol. 26, no. 6, pp. 658–736, 2002.

[26] R. Mayer and R. Moreno, “Aids to computer-basedmultimedia learning,” Learning and Instruction,vol. 12, no. 1, pp. 107–119, 2002.

[27] L. Carroll and E. Wiebe, “Static versusdynamic presentation of procedural instruction:Investigating the efficacy of video-based delivery,”in Proc. Human Factors and Ergonomics SocietyAnnual Meeting, 2004, pp. 1059–1063.

[28] S. Schwan and R. Riempp, “The cognitive benefitsof interactive videos: Learning to tie nauticalknot,” Learning and Instruction, vol. 14, no. 3, pp.293–305, 2004.

[29] F. Ganier, “Factors affecting the processing ofprocedural instructions: Implications for documentdesign,” IEEE Trans. Prof. Commun., vol. 47, no. 1,pp. 15–26, Mar., 2004.

[30] W. Schnotz, E. Picard, and A. Hron, “How dosuccessful and unsuccessful learners use text andgraphics?,” in Comprehension of Graphics in Texts,E. De Corte and W. Schnotz, Eds. New York:Pergamon, 1993, pp. 181–199.

[31] W. Winn, “The design and use of instructionalgraphics,” in Knowledge Acquisition FromText and Pictures, H. Mandl and J. Levin,Eds. Amsterdam: North Holland, 1989, pp.125–144.

[32] J. Sweller and P. Chandler, “Why some materialis difficult to learn,” Cognition and Instruction, vol.12, no. 3, pp. 185–223, 1994.

[33] M. Gellevij, H. van der Meij, T. de Jong, and J.Pieters, “The effects of screen captures in manuals:A textual and two visual manuals compared,” IEEETrans. Prof. Commun., vol. 42, no. 2, pp. 277–291,Jun., 1999.

[34] R. Krull, S. D’Souza, D. Roy, and D. Sharp,“Designing procedural illustrations: Special issueon acquiring procedural knowledge of a technologyinterface,” IEEE Trans. Prof. Commun., vol. 47, no.1, pp. 27–33, Mar., 2004.

[35] A. Maes and H. Lenting, “How to put theinstructive space into words?,” IEEE Trans. Prof.Commun., vol. 42, no. 2, pp. 100–113, Jun., 1999.

[36] W. Stone, “Repetitive strain injuries,” Med. J.Australia, vol. 10, no. 24, pp. 616–618, 1983.

[37] R. Balci and F. Aghazadeh, “The effect of work-restschedules and type of task on the discomfort andperformance of VDT users,” Ergonomics, vol. 46,no. 5, pp. 455–465, 2003.

[38] L. Mclean, M. Tingley, R. Scott, and J. Rickards,“Computer terminal work and the benefit ofmicro-break,” Appl. Ergonomics, vol. 32, no. 3, pp.225–237, 2001.

[39] T. Williams, L. Smith, and R. Herrick, “Exerciseas a prophylactic device against carpal tunnelsyndrome,” in Proc. 33rd Ann. Meeting of theHuman Factors Society, 1989, pp. 723–727.

[40] J. Sweller and P. Chandler, “Evidence for cognitiveload theory,” Cognition and Instruction, vol. 8, no.4, pp. 351–362, 1991.

[41] N. Marcus, M. Cooper, and J. Sweller,“Understanding instructions,” J. EducationalPsychology, vol. 88, no. 1, pp. 49–63, 1996.

[42] J. Sweller, J. Van Merriënboer, and F. Paas,“Cognitive architecture and instructional design,”Educational Psychology Rev., vol. 10, no. 3, pp.251–296, 1998.

[43] J. Sweller, Instructional Design in TechnicalAreas. Melbourne, Australia: ACER Press, 1999.

[44] G. Miller, “The magical number seven, plusor minus two: Some limits on our capacity forprocessing information,” Psychological Rev., vol.63, no. 2, pp. 81–97, 1956.

[45] A. Baddely, “Working memory,” Science, vol. 255,no. 5044, pp. 556–559, 1992.

[46] M. Chi, R. Glaser, and E. Rees, “Expertise inproblem solving,” in Advances in the Psychology ofHuman Intelligence, R. Sternberg, Ed. Hillsdale,NJ: Erlbaum, 1982, pp. 7–75.

IEEE

Pro

of

Web

Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 13

[47] J. Van Merriënboer, J. Schuurman, M. de Croock,and F. Paas, “Redirecting learners’ attentionduring training: Effects on cognitive load, transfertest, performance and training efficiency,” Learningand Instruction, vol. 12, no. 1, pp. 11–37, 2002.

[48] F. Paas, J. Tuovinen, H. Tabbers, and P. VanGerven, “Cognitive load measurement as a meansto advance cognitive load theory,” EducationalPsychologist, vol. 38, no. 1, pp. 63–71, 2003.

[49] R. Brünken, L. Plass, and D. Leutner, “Directmeasurement of cognitive load in multimedialearning,” Educational Psychologist, vol. 38, no. 1,pp. 53–61, 2003.

[50] A. Glenberg, “What memory is for,” BehavioralBrain Sci., vol. 20, no. 1, pp. 1–19, 1997.

[51] A. Glenberg and D. Robertson, “Indexicalunderstanding of instructions,” DiscourseProcesses, vol. 28, no. 1, pp. 1–26, 1999.

[52] A. Glenberg, “The indexical hypothesis: Meaningfrom language, world, and image,” in Working withWords and Images: New steps in an Old Dance, N.Allen, Ed. Westport, CT: Ablex, 2002, pp. 27–42.

[53] M. Theune, B. van Schooten, R. op den Akker,W. Bosma, D. Hofs, A. Nijholt, E. Krahmer,C. van Hooijdonk, and E. Marsi, “Questions,pictures, answers: Introducing pictures inquestion-answering systems,” in Proc. ACTAS-1 ofX Symposio Internacional de Comunicacion Social,2007, pp. 450–463.

[54] A. Kendon, “Gesticulation and speech: Two aspectsof the process of utterance,” in The Relationship ofVerbal and Nonverbal Communication, M. R. Key,Ed. Mouton: The Hague, 1980, pp. 207–227.

Charlotte van Hooijdonk is a Ph.D. student at TilburgUniversity, The Netherlands. Her Ph.D. research focuses onhow users process and evaluate multimodal informationpresentations using different evaluation methods. She has beena member of the IEEE Professional Communication Societysince 2005.

Emiel Krahmer is a Full Professor at Tilburg University, TheNetherlands. His research interests include multimodal andmultisensory information processing, developing and evaluatinglanguage technology applications, and the analysis of nonverbalcommunication of real and virtual humans.

IEEE

Pro

of

Prin

t Ver

sion

IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008 1

Information Modalities for Procedural Instructions:The Influence of Text, Pictures, and Film Clipson Learning and Executing RSI Exercises—CHARLOTTE VAN HOOIJDONK AND EMIEL KRAHMER

Abstract—Much of the empirical research on the effectiveness of different instructional designs has focused ondeclarative tasks, where a learner acquires knowledge about a certain topic. It is unclear to what extent findings forlearning declarative tasks (which are not consistent on all aspects) carry over to learning procedural tasks, wherea learner acquires a certain skill. In this paper, we describe an experiment studying a specific kind of proceduralinstructions, namely exercises for the prevention of repetitive strain injury (RSI), taking information modality (textversus picture versus film clip) and difficulty degree of the exercises (easy versus difficult) into account. In theexperiment, participants had to learn RSI exercises and were asked to execute them. The results showed that aninstruction in a picture lead to the shortest learning times followed by an instruction in a film clip. An instruction intext led to the longest learning times. For the amount of practicing the exercises during the learning phase, it wasfound that the participants in the film clip condition hardly engaged in practicing the exercises during the learningphase. The participants in the picture condition engaged in a moderate amount of practicing of the exercises duringthe learning phase. The participants in the text condition engaged in the most practicing during the learning phase.The results concerning the execution times showed that an instruction in a picture led to the lowest executiontimes followed by an instruction in a film clip. The instruction in text led to longest execution times. Finally, forthe amount of correctly executed exercises, it was found that learning from a film clip led to the highest learningperformance, both for easy and for difficult exercises. Learning from an instruction in text led to a fairly good learningperformance, both for easy and difficult exercises. Learning from a picture led to a good learning performance forthe easy exercises, but the performance dropped for the difficult exercises.

Index Terms—Document design, information modality, procedural learning.

The emergence of computer-based learning hasled to new possibilities for presenting instructionsand to a renewed interest in the effects of differentinformation modalities. Instructions can be givenin plain text, but also in the form of static visuals(e.g., diagrams, pictures) or in dynamic visuals(e.g., animations, film clips). Naturally, this raisesthe question which (combinations of) modalities arebest for which learning task. This question hasbeen addressed in a large number of recent studies(e.g., [1]–[6], among many others).

Each of the aforementioned presentation modeshas its own unique characteristics, and itsown advantages and disadvantages from aninstructional perspective. Language is the basicform of human communication, and one of its mainstrengths is that it is expressive and explicit (bothfor concrete and for abstract subject matter). Anadditional advantage of its written form (as opposed

Manuscript received October 6, 2006; revised February 20,2007.The authors are with the Department of Communication& Information Sciences, Faculty of Humanities, TilburgUniversity, 5000 LE Tilburg, The Netherlands (email:[email protected]; [email protected]).Color versions of one or more of the figures in this paper areavailable online at http://ieeexplore.ieee.org.

IEEE 10.1109/TPC.2007.2000054

to the spoken variant) is that it is not transient:written sentences remain visible on paper andcan be re-read if desired, while spoken sentencesare “gone” once they have been uttered. But,reading a text requires considerable skill and effort,and, moreover, text primarily facilitates linearinformation processing.

A difference between text and pictures is theirsymbol system. The symbol system used for text isabstract/linguistic, while the one used for picturesis sensory. This affects the type of information thatthey can convey. For example, pictures can directlycommunicate perceptual information (e.g., spatialorientation and location). In other words, pictureshave a lower “articulatory distance” [7]. Moreover,pictures are not constrained by the linear structureof text, and are therefore arguably more efficient atrepresenting nonlinear relations among objects. Intext these relationships often remain implicit [8].

The focus of many recent studies has been on theinstruction effects of dynamic visuals, presumablybecause such instructions only recently havebecome a real possibility due to the increasedcomputing power of standard multimedia PCs.A number of reasons have been suggested as towhy an advantage of dynamic visuals over otherpresentation possibilities such as text and staticvisuals might be expected. For instance, it has

0361-1434/$25.00 © 2008 IEEE

IEEE

Pro

of

Prin

t Ver

sion

2 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

been argued that dynamic visuals are beneficialfor learning since they offer a “complete model” ofa learning task (e.g., [1], [9]); in other words, theyare “informationally complete” [10, p. 249]. Whenstatic visuals or text are used, learners themselveswill have to construct a mental representation ofthe temporal aspects in the instruction. It has beenargued that dynamic illustrations offer a betterrepresentation of these temporal aspects, therebyreducing the level of abstraction and supporting adeeper level of understanding than static visualswould do [9]. Arguably, this should facilitatelearning since it would reduce learning times,require less practice, and result in fewer errors.

A substantial number of studies have tried todemonstrate this presumed learning benefit, butwith mixed results (e.g., [1], [5], [11], and [12]). Twogroups of explanations for these mixed results havebeen offered: one group of explanations stressesthe disadvantages of dynamic visuals; anothergroup points to methodological problems in thesestudies. The first explanation claims that dynamicvisuals have a fixed duration which viewerssimply have to watch. This set time period maylead to inherently longer learning times [11]. In asimilar vein, Lewalter [1] suggested that dynamicvisuals may take more time to process than otherpresentation modes; the information in dynamicvisuals changes continuously, and as a resultlearners could be overwhelmed with the amount ofinformation. Alternatively, it has been suggestedthat dynamic visuals are processed somewhatsuperficially; they require little cognitive effort, asthey can be watched rather passively (e.g., [10]and [13]). The second explanation, by contrast,argues that the mixed findings so far are primarilydue to methodological problems in the variousexperimental studies. For instance, to be reallybeneficial, dynamics should have some added value[14], which is not always the case. When it comesto temporal sequencing or indicating direction ofmotion, it has been claimed that arrows in staticvisuals may be just as effective as dynamic visuals(e.g., [15]). Tversky, Morrison, and Bétrancourt [11]point out that in some comparative studies there isa lack of equivalence between dynamic and staticvisuals in content or procedures (e.g., because thedynamic visuals convey more information or involveinteractivity). Some researchers have argued thateven when dynamic visuals do not lead to improvedlearning, they are more attractive and motivatingthan other instruction forms and should bepreferred for that reason (e.g., [11], [16], and [17]).However, this “subjective satisfaction” [18, p. 33] of

various instruction modes is often not addressed,and when it is the results are equivocal (e.g., [19]).A further complicating factor is that the learningdomain itself might also have an influence on therelative benefits of information modalities [14]. Itseems reasonable to assume that different learningdomains benefit from different kinds of instructions(cf., also [20]). In the field of learning psychology,many studies have shown that dynamic visualsare used as learning instructions for descriptive,“scientific” explanations (often describing causesand effects), such as mechanical and electronicsystems (e.g., brakes, pumps, generators [21],[22]; electronic circuit systems [23]); mathematicalconcepts (e.g., algebra [24]); or complex naturalphenomena (e.g., electricity [25]; lightning [26];gravitational lensing [1]; meteorological changes[2]). Arguably, some of these scientific learningdomains lend themselves better for dynamicvisualization than others. Moreover, what holdsfor descriptive tasks (such as those above), maynot apply to procedural ones, such as bandaginga hand [5], folding origami models [27], or tyingnautical knots [28]. These procedural tasks differfrom descriptive ones in a number of respects. Notonly is the nature of the task different (proceduresare inherently more linear—one step followinganother), but the learning goal is also different (thefocus is not only on understanding, but also onacquiring certain capabilities or skills) [29]. Oneof the factors that might affect the processing ofprocedural information is the design format [29].Procedural information is often presented in atext and/or picture format (e.g., route directions,maintenance instructions, assembly instructions).While pictures may help users to form a mentalmodel of the procedure [30], [31], the combinationof text plus picture may not always be helpful. Forexample, a user’s attention has to switch betweenthe information presented in text and picture,leading to the so-called split attention effect [32].Therefore, document designers have to understandthe strengths and limitations of both text andpictures when designing procedural information[8]. Both the type of picture (e.g., overview versuspartial view [33], line drawing versus photo [5],object-centered versus body-centered [34], etc.)as well as the type of textual instruction (e.g.,user-centered versus object-centered versusenvironment-centered [35]) may influence auser’s performance of the procedure. To furthercomplicate the situation, it may well be that besidesvariation between domains there is also variationwithin domains. Arguably, some learning tasks ina given domain are easier than others, and this

IEEE

Pro

of

Prin

t Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 3

may have an influence on the relative contributionof various instruction formats for those tasks.

In this paper, we report on a study of the effects oftask difficulty and information modality (comparingdynamic visuals with static visuals and text) onlearning a specific class of procedural tasks,namely exercises aiming at the prevention ofrepetitive strain injury (RSI). RSI is a general termfor disorders that are caused by repetitious use ofhands, arms, and shoulders, often as a result ofprolonged computer terminal work [36]. Recently,the term CANS (short for complaints of arm, neckand/or shoulders) was introduced as an alternativeterm, but here we will stick to the more familiarRSI terminology. It is generally assumed thattaking regular breaks during computer work incombination with exercises is beneficial for theprevention of RSI (e.g., [37]–[39]). RSI exercisesoffer a new and understudied learning domain,with various interesting properties. Generally,these exercises involve little or no abstraction, donot consist of a complicated sequence of actions,and are relatively short. Moreover, the exercisesare highly recognizable (almost everybody has twohands). It is interesting to observe that currentRSI information websites offer a large variety ofprevention exercises, in many different presentationformats (see Fig. 1 for a representative selection).These differences raise the natural question of

what is the effectiveness of the various presentationformats. Note also that there is substantialvariation in the difficulty level of existing RSIexercises, so that the other factor of interest(variation in task difficulty) can be modelled in afairly straightforward way in this domain.

The potential influence of both presentationmodality and task difficulty on learningperformance can be motivated from COGNITIVE

LOAD THEORY (e.g., [40]–[43]), a theory which aimsto develop optimal instructional designs whileconsidering the limitations of the human mind.Cognitive load theory builds on the broadly acceptedassumption that the mind combines a short-term(or working) memory of very limited capacity (whereall conscious activity and processing of informationoccurs [44], [45]) and a long-term memory witha virtually unlimited capacity [42]. According tocognitive load theory, learning amounts to theconstruction of new (or modification of existing)schemata [46], which are subsequently stored inlong-term memory. Since the capacity of workingmemory is severely limited, the cognitive load oflearners should be kept at a minimum duringlearning. In the current version of the theory [42],[43], three kinds of load are distinguished: INTRINSIC

load, caused by the structure and intrinsic natureof the learning task; EXTRANEOUS load, causedby the manner of presentation and its influence

Fig. 1. Three different websites using different presentation formats to illustrate RSI prevention exercises.

IEEE

Pro

of

Prin

t Ver

sion

4 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

on working memory; and GERMANE load, causedby learners’ efforts to process and comprehendlearning material. The sum of these three kinds ofload should not exceed working memory capacityin order to avoid cognitive overload. As arguedabove, the intrinsic load may vary both betweenand within task domains. Since the intrinsic loadof a particular learning task is fixed, instructiondesign can only influence the extraneous and thegermane load, and, obviously, when the intrinsicload of a particular task is high, there is relativelyless room for the extraneous and the germane load.

Germane load is a “positive” form of load, sinceit encourages learners to engage in cognitiveprocessing that may lead to improved schemataconstruction [42], [47]. This suggests that optimalinstructions are those which minimize extraneousand maximize germane load, while simultaneouslyavoiding overload. A complicating factor is thatthe distinction between extraneous and germanecognitive load, although intuitively plausible,cannot reliably be measured [47]. In general,measuring cognitive load in realistic learningsituations is not straightforward [48]. While somerecent attempts have been made to measurecognitive load directly (e.g., the dual task approachadvocated by Brünken, Plass, and Leutner [49]),many studies rely on indirect measures such asbehavioural or learning outcome measures (e.g.,[47] and [49]). Although these measures only relateto cognitive load indirectly, they do assess learningbehaviour directly, which is sufficient for ourcurrent purposes. In this study, we concentrateon learning times, amount of practicing duringlearning, execution times, and execution errors.

Arguably, the different information modalitiesof interest (text, picture, film clip) have differentloading potentials, which may influenceperformance on one or more of these measures.Of all three modalities, it seems likely that textimposes the highest load: It requires more mentaleffort to read a text than to watch a picture or afilm clip, hence it seems likely that learning fromtext takes longer than learning from pictures orfilm clips. Potentially, an additional complicationfor learning RSI prevention exercises from text isthat schemata construction may be more involvedthan for pictures and film clips. In the text case,learners have to “translate” the textual instructionsto manual gestures. Notice that this is a concreteinstance of Glenberg’s Indexical Hypothesis, statingthat readers associate words and phrases withobjects and actions in “the real world,” whichshould facilitate understanding (e.g., [50]–[52]). An

instruction in the form of a film clip and (probablyto a lesser extent) a picture depicts the hand andarm movements the learner should make, whereasthe learner has to infer these gestures when theyare presented in text. Hence, it is hypothesized thatlearners will practice more often while learning fromtext than while learning from visual presentations.An interesting question is whether the load imposedby text is only extraneous or also partly germane.It might be that the extra effort required to learnfrom text leads to good learning results (shortexecution times, few execution errors), especiallywhen the intrinsic load is low (so that overload canbe avoided).

Pictures arguably impose the lowest load of thethree presentation modalities; viewing a staticpicture requires little mental effort. Since a picturecaptures the essential information of a procedure, itis to be expected that learning times for pictures arerelatively short. However, due to their static nature,pictures offer little information about the temporalstructure of a procedure, and for more complicatedexercises (i.e., exercises with a higher intrinsic load)this may hamper schemata construction and mightresult in suboptimal execution performance.

To what extent film clips impose cognitive loadis uncertain. On the one hand, it can be arguedthat they may induce load, since the film clipscontinuously change and learners have to processthis information, which reduces the cognitiveresources available for germane load. But theymay also lessen cognitive load by relieving thelearner in the translation process, which isrequired when reading text. The main strengthof film clips might well be their “informationalcompleteness”—learners do not have to infer theexact sequence of movements from text or froma single snapshot because the entire sequence ofactions is visualized, which arguably facilitatesschemata construction. This “completeness”suggests that learners will not practice muchduring learning. Whether this will also result in fewexecution errors is not obvious; it may be that thelarge amount of external support relieves learnersof cognitive load demands that they would beable to fulfil, but which might prevent them fromperforming beneficial cognitive actions for learning.

The findings of this study will not only betheoretically relevant, but may also informmore practical work on multimodal informationpresentation. An example of such work is amedical question-answering (QA) system, which iscurrently developed within the IMOGEN (Interactive

IEEE

Pro

of

Prin

t Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 5

Multimodal Output GENeration) project that isembedded in the IMIX research programme. IMIX(Interactive Multimodal Information eXtraction)is a research programme in the field of Dutchspeech and language technology. A QA system isan automatic system that can answer a user’squestion posed in natural languages (e.g., Whatdoes RSI stand for?) with an answer formulated innatural language (e.g., RSI stands for RepetitiveStrain Injury). Today, QA systems are not onlyexpected to give answers to simple questions,like What does RSI stand for? but also to morecomplex questions, like How should I organize myworkspace in order to prevent RSI? or What is a goodexercise to prevent RSI in my hands? The answers tothese questions might be more informative if theycontained a picture or an animation, rather thanan extensive textual answer [53]. In the IMOGENproject, different aspects of multimodal informationpresentation are studied in order to improve theoutput quality of QA systems.

To find out what the actual strengths andweaknesses of the various information modalitiesare, we describe an experiment in whichparticipants have to learn and execute 20 RSIprevention exercises in two degrees of difficulty.We measure the following features: the influence ofpresenting an instruction in text; picture or filmclips on learning times; the amount of practicingduring learning; execution times; and number ofexecution errors. Besides these objective measures,participants are also asked for their subjectivesatisfaction.

METHOD

Participants Participants were 30 young adults,between 18 and 30 years of age. Of the participants,15 were male and 15 were female.

Design The experiment had a 3 (informationmodality) 2 (difficulty degree) factorial design,with information modality (dynamic visual [filmclip], static visual [picture], text) used as abetween-participants variable, and difficultydegree (easy, difficult) as a within-participantsvariable, and with learning times, amount ofpracticing during learning, execution times, andnumber of correctly executed exercises were usedas dependent variables. The participants wererandomly assigned to an experimental condition.

Stimuli A total of 20 RSI prevention exercises werechosen from websites on RSI prevention and RSIinjury prevention software (e.g., http://web.mit.

edu/atic/www/disabilities/rsi/exercises.html andhttp://www.workpace.com). The chosen exerciseswere all exercises to prevent RSI in hands and arms.Of the 20 exercises, 10 were assumed to be easy toperform, and 10 were assumed to be difficult. Thecriterion for a difficult exercise was that it shouldbe either a complex symmetrical movement or anasymmetrical movement. COMPLEX SYMMETRICAL

MOVEMENTS were classified as movements consistingof at least three sequential atomic movements, inthe course of which both arms and hands makethe same movements. ASYMMETRICAL MOVEMENTS

were classified as movements in which theparticipant should make a different movement witheach arm or hand. EASY EXERCISES were neitherasymmetrical nor complex movements. Fig. 2shows representative examples of an easy and adifficult exercise.

Fig. 2. Two typical exercises: one easy and one difficult.

Note that this operationalization of easy anddifficult tasks is based on the relative complexity ofthe sequence of motoric movements. To find out towhat extent this objective criterion coincided withsubjective perception of task difficulty, a pre-testwas carried out in which nine participants wereasked to classify the 20 exercises (presented intext plus picture format, and in random order).They were instructed to make two piles, the firstconsisting of the 10 exercises they consideredeasiest to perform, and the second pile consistingof the 10 exercises they considered more difficultto perform. It turned out that of the 10 exercisesclassified as easy, 7 were indeed perceived as easy

IEEE

Pro

of

Prin

t Ver

sion

6 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

by the vast majority ( 75%) of the participants,while the remaining 3 were perceived as difficult bya majority of the participants (presumably becausethese consisted of gestures that are motoricallysimple but not commonly used and hence withwhich participants may have been less familiar).Of the 10 exercises classified as difficult, 9 wereindeed perceived as such by the vast majority( 75%), while the remaining 1 was perceived aseasy by most participants (interestingly, this wasan exercise that was motorically complex butfamiliar to most participants). In sum, for 16 ofthe 20 exercises, there was a clear and consistentcorrelation between the objective and the perceiveddifficulty degree.

The 20 RSI prevention exercises were presentedin three different formats. In the text condition,the exercises were explained to the participantsin a purely textual format. The total amount ofwords did not differ between the 10 easy and 10difficult exercises: both counted 268 words in total.Thus, the average length was almost 27 words perexercise. Since some exercises are a few wordsshorter and others a few words longer, we will onlyreport on mean total averages for the 10 exercisesin each condition. Because natural language isoften ambiguous, the text exercises were checkedfor their comprehensibility in a second pre-testwith three participants (different from those in thefirst pre-test). It turned out that a few exercises ledto misunderstandings and so these exercises werereformulated. The new versions were presented tothe same three participants, and they agreed thatthe reformulations solved the misunderstandings.

In the picture condition, each of the 20 exerciseswas displayed in a single photograph. Digitalcamera pictures were taken of a female who didthe exercises. She wore black clothing and themovements were shot against a black backgroundso that only her hands were visible in the picture.The photo depicted the STROKE of the movement,with the phase of the movement as it unfoldsin time containing the “semantic content” ofthe movement [54]. To indicate the direction ofmovement, arrows were added to the pictures of9 difficult and 2 easy exercises. The size of thepictures was 1536 1014 pixels.

For the film clip condition, the same female in anidentical surrounding was filmed with a digitalfilm camera (25 frames per second). The totalamount of frames did not differ between easy

and difficult exercises: both counted 1097 frames(average film length was thus 4.39 seconds). Again,since some film clips are somewhat shorter, andothers somewhat longer, we will report on meantotal averages for the 10 easy and the 10 difficultexercises.

The exercises were presented to the participantsvia a website: one website for each condition. Theexperiment was run on a multimedia PC, with a17-inch color monitor. In all three conditions, thepresentations of the RSI exercises appeared at thecentre of the display. On the video website, themovements were shown in a film clip with a startand a stop button and a slider. The participantshad the opportunity to view the film as often as theydesired, but not much use was made of this option.Exercises were presented in one of two randomorders to control for possible learning effects.

Procedure The experiment consisted of threeparts: a practice session, the central part inwhich the 20 RSI exercises were presented, anda questionnaire. The participants completed eachpart individually. Each participant was invited toan experimental laboratory, where he or she tooka seat behind the computer. Participants were toldthat they would receive 20 exercises which theyhad to learn and perform to assess to what extentthey suffered from RSI. In addition, they were led tobelieve that their hand and arm movements werefilmed because a doctor would later look at therecordings of the participants to determine to whatextent they suffered from RSI. The procedure of theexperiment is depicted in Fig. 3.

After the participants had read the instructions,they could click on the hyperlink start, and the firsttrial exercise would appear. Depending on theirexperimental condition, the participants read orviewed the trial exercise until they thought that theycould properly execute the exercise. Subsequently,they clicked the hyperlink next at the bottom ofthe page. A new webpage appeared with the text“execute trial exercise” at the centre of the page.During the execution of an exercise, participantscould not see the instruction. When they hadexecuted the exercise, they clicked the hyperlinknext exercise at the bottom of the page. Aftercompleting the second trial exercise, participantswere asked if they had any questions regarding theexperimental procedure. If not, the participantscould start the actual experiment, proceeding in thesame way as they did in the trial session. Duringthe experiment, there was no further interactionbetween participants and the experiment leader.

IEEE

Pro

of

Prin

t Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 7

Fig. 3. Procedure of the experiment.

After the participants completed the 20 experimentalexercises, they received a questionnaire. In thisquestionnaire, the subjective satisfaction regardingthe content and structure of the website as wellas the comprehensibility and the attractiveness ofthe exercises were measured. The questionnaireconsisted of 16 bi-polar seven-point semanticdifferentials addressing structure and content ofthe websites, as well as comprehensibility andattractiveness of the exercises (see Appendix A).The presentation order of the adjectives wasrandomized. For processing, the positive adjectiveswere mapped to one (very positive) and the negativeadjectives were mapped to seven (very negative).

The participants were debriefed at the end of theexperiment.

Data Processing The following data was collected:learning times; number of practiced exercisesduring the learning time; execution times; numberof correctly executed exercises; and the results ofthe questionnaire.

The time it took the participants to learnand execute the exercises was computedfrom the data of the log program ProxyPlus(http://www.proxyplus.com). This programregistered the times associated with participants’mouse clicks during the experiment. The timeperiod between clicking the hyperlink next whichpreceded the instruction of an exercise and thehyperlink next which followed the instructionof an exercise was defined as the LEARNING TIME

(see Fig. 3). The time period between clicking thehyperlink next which followed the instruction of anexercise and the hyperlink next that preceded anew instruction of an exercise was defined as theEXECUTION TIME (see Fig. 3).

A digital video camera was used to recordthe movements the participants made duringthe experiment. These video recordings of theparticipants’ hands and arms were used to assesswhether the participants practiced the exercisesduring the learning period and to assess theirperformances while executing the RSI exercises.Occasionally, a participant performed an exercisein a correct but not intended way. These cases werecounted as correctly executed exercises. One judgedid the assessment; scoring was easy, and the fewdifficult cases that did arise were resolved afterdiscussion.

Tests for significance were performed using arepeated measures analysis of variance (ANOVA),with a significance threshold of 0.05. For post-hoctests the Bonferroni method was used. Theinternal consistency of the four item sets of thequestionnaire was measured using Cronbach’salpha.

RESULTS

Learning Times Table I summarizes the results.First consider the learning time. We found thatthe difficulty degree had an effect on the amountof time to learn the exercises ( ,

). Overall, the participants needed moretime to learn the difficult exercises than the easyones. Also, the information presentation had

IEEE

Pro

of

Prin

t Ver

sion

8 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

an effect on the learning time ( ,). Participants in the picture condition

required the shortest learning times, participantsin the text condition had the longest learningtimes, with learning times from film clips falling inbetween those two. Post-hoc tests indicated thatthe instruction in text differed significantly from theinstruction in a picture . There was nosignificant difference between the instruction in textand instruction in a film clip . Also, theinstruction in a picture did not differ significantlyfrom the instruction in a film clip . Nosignificant interactions between difficulty degreeand information modality were found.

Amount of Practicing During the Learning PeriodTable I also reveals that the participants practicedthe difficult exercises more often than the easy onesduring the learning period, and this difference wasfound to be statistically significant ( ,

). Also, information presentation had aneffect on the amount of practicing during thelearning period ( , ). In thefilm clip condition, participants almost neverpracticed; in the picture condition, they practicedfor about a fifth of the exercises, while in the textcondition, participants practiced about half of theexercises. Post-hoc tests showed that the differencebetween the instruction in text and the instructionin a picture was not significant . Textdiffered significantly from the instruction in a filmclip . There was no significant differencebetween the instruction in a picture and a film clip

, presumably because of the relativelyhigh standard deviation. No significant interactioneffects were found.

Execution Times The difficulty degree had amain effect on the amount of time needed toperform the exercises ( , ).The participants required more time to execute

the difficult exercises than the easy ones. Therewas also a main effect of information modality onthe execution times ( , ).Participants in the text condition had much longerexecution times than those in the picture andfilm clip conditions. Participants in the picturecondition needed somewhat less time for executionthan the participants in the film clip condition,but the differences are relatively small and thestandard deviations are relatively high. Post-hoctests indicated that there was a significantdifference between the instruction in text andthe instruction in a picture . Also, theinstruction in text significantly differed from theinstruction in a film . There was nosignificant difference between the instruction in afilm clip and in a picture . There was nosignificant interaction between difficulty degree andinformation modality.

Amount of Correctly Executed Exercises Amain effect of difficulty degree on the performancewas found ( , ). As canbe seen in Table I, the participants executed onaverage 9.1 easy exercises correctly, as opposedto 8.3 difficult ones. There was also a maineffect of information modality: the participantswho watched the film clip executed the mostmovements correctly ( , ). Atwo-way interaction between difficulty degree andinformation modality was found ( ,

). This interaction effect can be explained asfollows: For the instruction in text ( ,

) and for the instruction in a film clip( , ) no significant differenceswere found in the amount of correctly executedeasy and difficult exercises. However, for theinstruction in a picture ( , ),a significant difference was found in the amountof correctly executed easy and difficult exercises.

TABLE ILEARNING PERFORMANCE RESULTS PER DEGREE OF DIFFICULTY AND INFORMATION MODALITY

IEEE

Pro

of

Prin

t Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 9

The participants in the picture condition correctlyexecuted fewer of the difficult RSI exercises thanthe easy exercises (respectively, 9.2 versus 7.6).

Subjective Satisfaction The internal consistencyon the four item sets in the questionnaire wasmeasured using Cronbach’s alpha. Followingstandard practice, we qualify alpha as adequate ifits value was higher than 0.70. For the structure ofthe website the alpha was 0.78, and for the contentof the website it was 0.56. In Table II, we reportseparately on the items concerning the content ofthe website. The alpha for the comprehensibilityof the exercises was 0.82, and 0.83 for theirattractiveness.

TABLE IIMEAN RESULTS FOR THE SUBJECTIVE SATISFACTION

QUESTIONNAIRE PER INFORMATION MODALITY (FROM 1 =“VERY SATISFIED” TO 7 = “VERY DISSATISFIED”

Table II gives an overview of the results of thesubjective satisfaction questionnaire. Informationmodality had no effect on the subjective satisfactionregarding the website and the exercises. No effectswere found between the three conditions for thewebsite and for the exercises .

CONCLUSION AND DISCUSSION

In this paper, we investigated the learnability ofRSI prevention exercises in different presentationformats. What effective ways of learning suchexercises is an important research question withclear practical ramifications, in view of the growing

awareness of the importance of RSI prevention andthe bewildering array of presentation formats forRSI prevention exercises currently being employedon the internet and in RSI prevention software.

An experiment was performed looking at theeffect of offering RSI prevention exercises in threedifferent formats (film clips, pictures, or text)focusing on learning time, the amount of practicing,execution time, learning results, and subjectivesatisfaction. To model variation within a domain,20 RSI prevention exercises were selected fromdifferent sources in such a way that 10 exerciseswere motorically easy (symmetric and simple) and10 exercises were more difficult (asymmetric orcomplex). A pre-test with nine participants revealedthat there was a strong connection betweenobjective and perceived difficulty degree.

As a manipulation check, we tested whethereasy exercises were “easier” to perform thanthe assumed difficult ones. The results showedthat irrespective of presentation format, the easyexercises were associated with shorter learningtimes, less practicing, shorter execution times, andless execution errors than the difficult exercises.It is worth repeating that the summed length (interms of frames and number of words) of the 10easy exercises was exactly the same as the summedlength of the 10 difficult ones, so that theseeffects can only be attributed to the differences incomplexity (intrinsic cognitive load).

Of the three presentation formats underinvestigation, text was expected to impose thehighest load. Overall, text indeed led to the longestlearning times, which can in part be ascribed tothe fact that it takes more mental effort to read atext than to look at a picture or watch a film clip.Furthermore, during the learning phase, people notonly read the text, but also engage in a substantialamount of practicing, which takes time as well.People in the text condition by far did the mostpracticing, which is consistent with the IndexicalHypothesis [50]–[52]: To foster understanding,participants “translated” the textual instructionsinto actual movements during learning for manyexercises. Participants must thus engage in fairlydeep processing of the textual instructions, butarguably this is to some extent a form of germaneload: The deep processing and practicing appearto be beneficial for learning. Contrary to whatone might expect, the relatively long learningphase does not lead to shorter execution times.Still, it does lead to a surprisingly good learningperformance. For the easy exercises, participants

IEEE

Pro

of

Prin

t Ver

sion

10 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

in the text condition made a few more errors thanparticipants in the other two conditions, but for thedifficult exercises, performance was even slightlybetter than for pictures, which suggests that thegermane activities pay off.

Pictures were expected to impose the lowest load.The average learning times were indeed lowest inthe picture condition, as were the average executiontimes. The pictures did not offer a complete modelof the exercise, because only the stroke of themovement was depicted, with arrows indicatingmotion where applicable. The expectation wasthat people generally would be able to derive thecomplete exercise on the basis of this information.This might explain why a moderate amount ofpracticing took place in this condition (morethan for film clips). For easy exercises, learningfrom pictures led to a good performance. In fact,participants made as few errors for these exercisesas participants in the film clip condition. But theperformance dropped for the difficult exercises,where as many errors were made as in the textcondition.

Concerning the load of film clips, the followingtwo contrasting hypotheses were mentioned: Theymight induce load because a learner has to processthe continuously changing images, but they mightalso reduce load, freeing the learner by simplypresenting a complete, physical model of the taskto be carried out. It was found that film clips ledto medium length learning times (between pictureand text). In part, this can be attributed to thefact that watching a clip takes a fixed length.But it is interesting to see that difficult exercisesrequired longer learning times than easy ones, eventhough they were of the same average duration(and it is not the case that learners practiceddifficult exercises more often), which was probablydue to the higher average intrinsic load of thedifficult tasks. There was virtually no practicingin the film clip condition, as expected, since theclips offer an informationally complete model ofthe task. Contrary to expectation, the executiontimes were not the shortest, which suggests thatparticipants still had to engage in cognitive activityduring the execution phase. Film clips did lead toa consistently high learning performance, bothfor easy and for difficult exercises. Hence, despitepracticing during learning, learners do constructthe necessary scheme based on germane cognitiveprocesses.

It is interesting to observe that none of thepresentation formats appeared to be superior on all

dimensions of interest; each had some disadvantage(less efficient for learning, relatively many errors,etc.). In view of this, it is perhaps not surprisingthat the subjective satisfaction questionnaire didnot reveal any significant differences. The factthat no single modality outperformed the otherson all dimensions implies that there is no single,straightforward design recommendation on howto present information (e.g., in the context of amedical QA system). The goal of the informationpresentation influences the type of presentation.For example, if the amount of practicing isconsidered to be the most important factor, the RSIexercises are best described in text. However, ifquick leaning is most important, the RSI exercisesare best illustrated with a picture. Finally, whenthe goal of the information presentation is agood overall execution, the RSI exercises are bestvisualized with a film clip.

RSI prevention exercises offer a new and ecologicallyvalid learning situation with a number of interestingproperties. These exercises are quite brief andarguably relatively easy to learn. A downside of thischoice is that, with respect to learning performance(number of errors), there may be a ceiling effect,in that easy exercises are learned very well for allthree modalities. It would be interesting to redo theexperiment with more complex RSI related tasks,and see whether this change would lead to moredifferentiation between the different modalitieswhere errors are concerned.

While we did our best to make sure that theexercises in the three conditions offered comparableinformation, following the recommendations ofTversky et al. [11], it turned out that this wasnot always straightforward. While a static picturecombined with an arrow indicating direction ormotion can be very informative, it does not makethe entire intended movement explicit as a dynamicpicture does. In the former case, but not in the lattercase, the learner has to infer the full movement,which may lead to errors, especially for the difficultexercises as we have seen. Still, it is interestingto see that the efficiency of pictures (learning andexecution times) is higher than that of the othertwo modalities, and leads to nearly optimal resultsfor easy exercises. This finding indicates that aparticularly efficient method for illustrating morecomplex exercises might be via a series of picturesdepicting key stages of the procedure. One wouldexpect that this could lead to both high efficiencyand good learning results, for easy as well as fordifficult exercises. It would also be interesting tostudy the effect of series of pictures versus single

IEEE

Pro

of

Prin

t Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 11

pictures, or film clips in multimodal instructions(e.g., the use of visuals plus speech). We intend toexplore these possibilities in future research.

In a somewhat similar vein, we found that certainRSI exercises seem to be inherently easier torepresent than others, and especially that thisease-of-representation may vary across differentpresentation formats. Some movements can beconcisely described in language because the entiremovement has been “coded” in a fixed expression(e.g., make fists), whereas other movements can be

rather cumbersome to describe. Also, expressinghow a particular movement “feels” (e.g., spreadyour fingers until a mild stretch between the fingersis felt) is obviously easier in language than instatic or dynamic visuals. For such exercises, atextual presentation might have an added valueover other presentation formats. It would beinteresting to systematically vary the presence orabsence of linguistic short cuts (describing complexmovements in a few words) and investigate howthis variance influences the effectiveness of textualinstructions.

APPENDIX

SUBJECTIVE SATISFACTION QUESTIONNAIRE

(ENGLISH TRANSLATION OF DUTCH ORIGINALS)

(1) The structure of the website is: (a)Orderly/Disorderly (b) Clear/Unclear (c) Easyto access/Not easy to access (d) Easy to seethrough/Not easy to see through

(2) The content of the website is: (a)Informative/Uninformative (b) Clear/Unclear(c) Comprehensible/Incomprehensible (d)Understandable/Unintelligible

(3) The comprehensibility of the exercisesis: (a) Easy/Difficult (b) Simple/Hard (c)Clear/Unclear (d) Unambiguous/Ambiguous

(4) The attractiveness of the exercises is: (a)Varied/Unvaried (b) Interesting/Uninteresting(c) Appealing/ Unappealing (d) Fascinating/Tiresome

ACKNOWLEDGMENT

A preliminary analysis of the experiment hasappeared in Dutch: C. van Hooijdonk andE. Krahmer, “De invloed van unimodale enmultimodale instructies op de effectiviteitvan RSI-preventieoefeningen,” Tijdschrift voorTaalbeheersing, vol. 28, no. 2, pp. 73–87, 2006.

The current research is part of the InteractiveMultimodal Information eXtraction (IMIX) project,which is funded by the Netherlands Organisationfor Scientific Research (NWO). Thanks are dueto P. Barkhuysen and L. van der Laar for theirhelp in setting up the experiment, and to F. Maesand N. Ummelen for their useful comments on an

earlier version of this paper. Moreover, we wouldlike to thank the two anonymous reviewers for theirsuggestions.

REFERENCES

[1] D. Lewalter, “Cognitive strategies for learningfrom static and dynamic visuals,” Learning andInstruction, vol. 13, no. 2, pp. 177–189, 2003.

[2] R. Lowe, “Interrogation of a dynamic visualizationduring learning,” Learning and Instruction, vol. 14,no. 3, pp. 257–274, 2004.

[3] R. Mayer, Multimedia Learning. New York:Cambridge Univ. Press, 2001.

[4] R. Mayer, “The promise of multimedia learning:Using the same instructional design methodsacross different media,” Learning and Instruction,vol. 13, no. 2, pp. 125–139, 2003.

[5] I. Michas and D. Berry, “Learning a proceduraltask: Effectiveness of multimedia presentations,”Appl. Cognitive Psychology, vol. 14, no. 6, pp.555–575, 2000.

[6] R. Ploetzner and R. Lowe, “Dynamic visualisationsand learning,” Learning and Instruction, vol. 14,no. 3, pp. 235–240, 2004.

[7] T. Williams and D. Harkus, “Editing visualmedia,” IEEE Trans. Prof. Commun., vol. 41, no. 1,pp. 33–45, Mar., 1998.

[8] J. Larkin and H. Simon, “Why a diagram is(sometimes) worth a thousand words,” CognitiveSci., vol. 11, no. 1, pp. 65–99, 1987.

[9] O. Park and R. Hopkins, “Instructional conditionsfor using dynamic visual displays: A review,” J.Instructional Sci., vol. 21, no. 6, pp. 427–449, 1993.

[10] W. Schnotz, J. Böckheler, and H. Grzondziel,“Individual and co-operative learning withinteractive animated pictures,” Eur. J. PsychologyEdu., vol. 14, no. 2, pp. 245–265, 1999.

[11] B. Tversky, J. Morrison, and M. Bétrancourt,“Animation: Can it facilitate?,” Int. J.Human-Comput. Stud., vol. 57, no. 4, pp. 247–262,2002.

IEEE

Pro

of

Prin

t Ver

sion

12 IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 51, NO. 1, MARCH 2008

[12] M. Bétrancourt and B. Tversky, “Effects ofcomputer animation on user’s performance: Areview,” Le Travail Humain, vol. 63, no. 4, pp.311–329, 2000.

[13] W. Schnotz and T. Rasch, “Enabling, facilitating,and inhibiting effects of animations in multimedialearning: Why reduction of cognitive load can havenegative results on learning,” Educational Technol.Res. Devel., vol. 53, no. 3, pp. 47–58, 2005.

[14] R. Weiss, D. Knowlton, and G. Morrison,“Principles for using animation in computer-basedinstruction: Theoretical heuristics for effectivedesign,” Comput. Human Behavior, vol. 18, no. 4,pp. 465–477, 2002.

[15] B. Tversky, J. Zacks, P. Lee, and J. Heiser,“Lines, blobs, crosses, and arrows: Diagrammaticcommunication with schematic figures,” in Theoryand Application of Diagrams, M. Anderson, P.Cheng, and V. Haarslev, Eds. Berlin: Springer,2000, pp. 221–230.

[16] E. Perez and M. White, “Student evaluationof motivational and learning attributes ofmicrocomputer software,” J. Comput.-BasedInstruction, vol. 12, no. 2, pp. 39–43, 1985.

[17] L. Rieber, “Animation, incidental learning, andcontinuing motivation,” J. Educational Psychology,vol. 83, no. 3, pp. 318–328, 1991.

[18] J. Nielsen, Usability Engineering. New York:Academic, 1993.

[19] J. Pane, A. Corbett, and B. John, “Assessingdynamics in computer-based instruction,” in Proc.ACM Conf. Human Factors in Computing Systems,1996, pp. 797–804.

[20] M. Hegarty, “Dynamic visualizations and learning:Getting to the difficult questions,” Learning andInstruction, vol. 14, no. 3, pp. 343–351, 2004.

[21] R. Mayer, “Systematic thinking fostered byillustrations in scientific text,” J. EducationalPsychology, vol. 81, no. 2, pp. 240–246, 1989.

[22] R. Mayer and J. Gallini, “When is an illustrationworth a thousand words?,” J. EducationalPsychology, vol. 82, no. 4, pp. 715–726, 1990.

[23] O. Park and S. Gittelman, “Dynamiccharacteristics of mental models and dynamicvisual displays,” J. Instructional Sci., vol. 23, no.5–6, pp. 303–320, 1995.

[24] S. Reed, “Effects of computer graphics onimproving estimates to algebra word problems,”J. Educational Psychology, vol. 77, no. 3, pp.285–298, 1985.

[25] P. Cheng, “Electrifying diagrams for learning:Principles for complex representational systems,”Cognitive Sci., vol. 26, no. 6, pp. 658–736, 2002.

[26] R. Mayer and R. Moreno, “Aids to computer-basedmultimedia learning,” Learning and Instruction,vol. 12, no. 1, pp. 107–119, 2002.

[27] L. Carroll and E. Wiebe, “Static versusdynamic presentation of procedural instruction:Investigating the efficacy of video-based delivery,”in Proc. Human Factors and Ergonomics SocietyAnnual Meeting, 2004, pp. 1059–1063.

[28] S. Schwan and R. Riempp, “The cognitive benefitsof interactive videos: Learning to tie nauticalknot,” Learning and Instruction, vol. 14, no. 3, pp.293–305, 2004.

[29] F. Ganier, “Factors affecting the processing ofprocedural instructions: Implications for documentdesign,” IEEE Trans. Prof. Commun., vol. 47, no. 1,pp. 15–26, Mar., 2004.

[30] W. Schnotz, E. Picard, and A. Hron, “How dosuccessful and unsuccessful learners use text andgraphics?,” in Comprehension of Graphics in Texts,E. De Corte and W. Schnotz, Eds. New York:Pergamon, 1993, pp. 181–199.

[31] W. Winn, “The design and use of instructionalgraphics,” in Knowledge Acquisition FromText and Pictures, H. Mandl and J. Levin,Eds. Amsterdam: North Holland, 1989, pp.125–144.

[32] J. Sweller and P. Chandler, “Why some materialis difficult to learn,” Cognition and Instruction, vol.12, no. 3, pp. 185–223, 1994.

[33] M. Gellevij, H. van der Meij, T. de Jong, and J.Pieters, “The effects of screen captures in manuals:A textual and two visual manuals compared,” IEEETrans. Prof. Commun., vol. 42, no. 2, pp. 277–291,Jun., 1999.

[34] R. Krull, S. D’Souza, D. Roy, and D. Sharp,“Designing procedural illustrations: Special issueon acquiring procedural knowledge of a technologyinterface,” IEEE Trans. Prof. Commun., vol. 47, no.1, pp. 27–33, Mar., 2004.

[35] A. Maes and H. Lenting, “How to put theinstructive space into words?,” IEEE Trans. Prof.Commun., vol. 42, no. 2, pp. 100–113, Jun., 1999.

[36] W. Stone, “Repetitive strain injuries,” Med. J.Australia, vol. 10, no. 24, pp. 616–618, 1983.

[37] R. Balci and F. Aghazadeh, “The effect of work-restschedules and type of task on the discomfort andperformance of VDT users,” Ergonomics, vol. 46,no. 5, pp. 455–465, 2003.

[38] L. Mclean, M. Tingley, R. Scott, and J. Rickards,“Computer terminal work and the benefit ofmicro-break,” Appl. Ergonomics, vol. 32, no. 3, pp.225–237, 2001.

[39] T. Williams, L. Smith, and R. Herrick, “Exerciseas a prophylactic device against carpal tunnelsyndrome,” in Proc. 33rd Ann. Meeting of theHuman Factors Society, 1989, pp. 723–727.

[40] J. Sweller and P. Chandler, “Evidence for cognitiveload theory,” Cognition and Instruction, vol. 8, no.4, pp. 351–362, 1991.

[41] N. Marcus, M. Cooper, and J. Sweller,“Understanding instructions,” J. EducationalPsychology, vol. 88, no. 1, pp. 49–63, 1996.

[42] J. Sweller, J. Van Merriënboer, and F. Paas,“Cognitive architecture and instructional design,”Educational Psychology Rev., vol. 10, no. 3, pp.251–296, 1998.

[43] J. Sweller, Instructional Design in TechnicalAreas. Melbourne, Australia: ACER Press, 1999.

[44] G. Miller, “The magical number seven, plusor minus two: Some limits on our capacity forprocessing information,” Psychological Rev., vol.63, no. 2, pp. 81–97, 1956.

[45] A. Baddely, “Working memory,” Science, vol. 255,no. 5044, pp. 556–559, 1992.

[46] M. Chi, R. Glaser, and E. Rees, “Expertise inproblem solving,” in Advances in the Psychology ofHuman Intelligence, R. Sternberg, Ed. Hillsdale,NJ: Erlbaum, 1982, pp. 7–75.

IEEE

Pro

of

Prin

t Ver

sion

VAN HOOIJDONK AND KRAHMER: INFORMATION MODALITIES FOR PROCEDURAL INSTRUCTIONS 13

[47] J. Van Merriënboer, J. Schuurman, M. de Croock,and F. Paas, “Redirecting learners’ attentionduring training: Effects on cognitive load, transfertest, performance and training efficiency,” Learningand Instruction, vol. 12, no. 1, pp. 11–37, 2002.

[48] F. Paas, J. Tuovinen, H. Tabbers, and P. VanGerven, “Cognitive load measurement as a meansto advance cognitive load theory,” EducationalPsychologist, vol. 38, no. 1, pp. 63–71, 2003.

[49] R. Brünken, L. Plass, and D. Leutner, “Directmeasurement of cognitive load in multimedialearning,” Educational Psychologist, vol. 38, no. 1,pp. 53–61, 2003.

[50] A. Glenberg, “What memory is for,” BehavioralBrain Sci., vol. 20, no. 1, pp. 1–19, 1997.

[51] A. Glenberg and D. Robertson, “Indexicalunderstanding of instructions,” DiscourseProcesses, vol. 28, no. 1, pp. 1–26, 1999.

[52] A. Glenberg, “The indexical hypothesis: Meaningfrom language, world, and image,” in Working withWords and Images: New steps in an Old Dance, N.Allen, Ed. Westport, CT: Ablex, 2002, pp. 27–42.

[53] M. Theune, B. van Schooten, R. op den Akker,W. Bosma, D. Hofs, A. Nijholt, E. Krahmer,C. van Hooijdonk, and E. Marsi, “Questions,pictures, answers: Introducing pictures inquestion-answering systems,” in Proc. ACTAS-1 ofX Symposio Internacional de Comunicacion Social,2007, pp. 450–463.

[54] A. Kendon, “Gesticulation and speech: Two aspectsof the process of utterance,” in The Relationship ofVerbal and Nonverbal Communication, M. R. Key,Ed. Mouton: The Hague, 1980, pp. 207–227.

Charlotte van Hooijdonk is a Ph.D. student at TilburgUniversity, The Netherlands. Her Ph.D. research focuses onhow users process and evaluate multimodal informationpresentations using different evaluation methods. She has beena member of the IEEE Professional Communication Societysince 2005.

Emiel Krahmer is a Full Professor at Tilburg University, TheNetherlands. His research interests include multimodal andmultisensory information processing, developing and evaluatinglanguage technology applications, and the analysis of nonverbalcommunication of real and virtual humans.