learning loops - interactions between guided reflection and experience-based learning in a serious...

23
Learning loops – interactions between guided reflection and experience-based learning in a serious game activity B. Cowley,*† T. Heikura§ & N. Ravaja*‡ *Centre for Knowledge and Innovation Research, School of Economics, Aalto University, Helsinki, Finland †Cognitive Science Unit, Department of Behavioural Sciences, University of Helsinki, Helsinki, Finland ‡Department of Social Research and Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland §Finnish Science Park Association TEKEL, Helsinki, Finland Abstract In a study on experience-based learning in serious games, 45 players were tested for topic comprehension by a questionnaire administered before and after playing the single-player serious game Peacemaker (Impact Games 2007). Players were divided into two activity conditions: 20 played a 1-h game with a 3-min half-time break to complete an affect self- report form while 25 also participated in a 20-min reflective group discussion during their half-time break. During the discussion, they were asked by an experimenter to reflect on a set of topics related to the game. We present the analysis of the questionnaires, which illustrates that contrary to our expectations the reflection period had a negative effect on the learning of the players as judged by their performance on closed-form questions at levels 1–5 (out of 6) on the Bloom taxonomy of learning outcomes. The questionnaire also included a few open questions which gave the players a possibility to display deep (level 6) learning. The players did not differ significantly between conditions regarding the questions measuring deep learning. Keywords assessment, experience-based learning, reflection, serious game, technology enhanced learning. Whence has it all the materials of reason and knowl- edge? To this I answer, in one word, from Experience. (Locke, 1690) The efficacy of serious games as tools for learning has some supporting evidence from literature (de Freitas, 2006; de Freitas & Griffiths, 2008). However, when dealing with highly complex or controversial content it is not so evident whether a serious game can function as a stand-alone learning experience, or indeed how it is best deployed to deliver a satisfac- tory learning outcome (O’Neil, Wainess, & Baker, 2005). According to O’Neil et al. (2005, p. 467), there is ‘consensus among a large number of researchers with regard to the negative, mixed or null findings of [serious] games research, suggesting that the cause might be a lack of sound instructional design embed- ded in the games’. We investigate a particular com- position of instruction design with a serious game. The motivation for this experiment stems from the EC-funded R&D project TARGET, which is develop- ing a serious game to teach corporate competences in innovation, project management and sustainable global manufacturing. The global competition for highly Accepted: 10 February 2013 Correspondence: Ben Cowley, Cognitive Science Unit, Department of Behavioural Sciences, University of Helsinki, PO Box 9, Siltavuoren- penger 1A, Helsinki 00014, Finland. Email: ben.cowley@helsinki.fi doi: 10.1111/jcal.12013 Original article © 2013 John Wiley & Sons Ltd Journal of Computer Assisted Learning (2013), 29, 348–370 348

Upload: helsinki

Post on 04-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Learning loops – interactions between guidedreflection and experience-based learning in aserious game activityB. Cowley,*† T. Heikura§ & N. Ravaja*‡*Centre for Knowledge and Innovation Research, School of Economics, Aalto University, Helsinki, Finland†Cognitive Science Unit, Department of Behavioural Sciences, University of Helsinki, Helsinki, Finland‡Department of Social Research and Helsinki Institute for Information Technology, University of Helsinki, Helsinki, Finland§Finnish Science Park Association TEKEL, Helsinki, Finland

Abstract In a study on experience-based learning in serious games, 45 players were tested for topiccomprehension by a questionnaire administered before and after playing the single-playerserious game Peacemaker (Impact Games 2007). Players were divided into two activityconditions: 20 played a 1-h game with a 3-min half-time break to complete an affect self-report form while 25 also participated in a 20-min reflective group discussion during theirhalf-time break. During the discussion, they were asked by an experimenter to reflect on a setof topics related to the game. We present the analysis of the questionnaires, which illustratesthat contrary to our expectations the reflection period had a negative effect on the learning ofthe players as judged by their performance on closed-form questions at levels 1–5 (out of 6)on the Bloom taxonomy of learning outcomes. The questionnaire also included a fewopen questions which gave the players a possibility to display deep (level 6) learning. Theplayers did not differ significantly between conditions regarding the questions measuring deeplearning.

Keywords assessment, experience-based learning, reflection, serious game, technology enhancedlearning.

Whence has it all the materials of reason and knowl-edge? To this I answer, in one word, from Experience.(Locke, 1690)

The efficacy of serious games as tools for learninghas some supporting evidence from literature (deFreitas, 2006; de Freitas & Griffiths, 2008). However,when dealing with highly complex or controversialcontent it is not so evident whether a serious gamecan function as a stand-alone learning experience, or

indeed how it is best deployed to deliver a satisfac-tory learning outcome (O’Neil, Wainess, & Baker,2005). According to O’Neil et al. (2005, p. 467), thereis ‘consensus among a large number of researcherswith regard to the negative, mixed or null findings of[serious] games research, suggesting that the causemight be a lack of sound instructional design embed-ded in the games’. We investigate a particular com-position of instruction design with a serious game.The motivation for this experiment stems from theEC-funded R&D project TARGET, which is develop-ing a serious game to teach corporate competences ininnovation, project management and sustainable globalmanufacturing. The global competition for highly

Accepted: 10 February 2013Correspondence: Ben Cowley, Cognitive Science Unit, Departmentof Behavioural Sciences, University of Helsinki, PO Box 9, Siltavuoren-penger 1A, Helsinki 00014, Finland. Email: [email protected]

bs_bs_banner

doi: 10.1111/jcal.12013

Original article

© 2013 John Wiley & Sons Ltd Journal of Computer Assisted Learning (2013), 29, 348–370348

skilled people calls for organizations to invest in retain-ing and re-training their existing staff in new ways.Current practices (e.g., face to face, handcrafted) areoften expensive and time consuming, creating a strongmotivation to find new approaches to rapid competencedevelopment. Indeed, a clear message from the pro-ject’s industrial partners (Nokia, Atos) is that time tocompetence is critical, requiring teaching tools thatfast-track the sensitization of learners to new material.Such ‘priming’ could be efficiently achieved by, forexample, a serious game; but only with ‘sound instruc-tional design’.

A game is an experience-based learning (EBL) pro-tocol (Gee, 2003), but according to EBL theory, it is theact of reflection that elevates the experience onto amore meaningful level and which constitutes learning(Dewey, 1910; Kolb, 1984; Wertenbroch & Nabeth,2000; Wetzel & Strudler, 2006). As cost-benefit ratiois carefully controlled in corporate environments,efficacy-based justification must be provided for anyproposed instructional features (e.g., a group reflec-tion session) beyond the serious game. A discussion/reflection session increases the time spent on thelearning activity, induces more costs by requiring afacilitator and involves more coordination to co-locatelearners. Thus, a discussion/reflection period is justifi-able based on educational science literature, yet itcannot be unquestioningly deployed with a seriousgame, given that the effect of combining these twopedagogical elements is not established.

More specifically, the case of proposing a comple-mentary reflection period to be used in connection to aserious game in company environments is an exampleof the kind of conflicts of interests that need to beresolved in the iterative serious game design processwhere goals (competences to be learned) and means(pedagogy and game mechanics) must consider andadapt to each other (Cowley, Bedek, Heikura, Ribiero,& Petersen, 2012). So the objective of this experimentwas to investigate the impact of a reflection periodalongside a serious game, in order to test the efficacy ofthis pairing as a pedagogical design.

The game players were asked to play 1 h of thePeacemaker serious game (Impact Games 2007; seeAppendix II). The Peacemaker serious game has beendesigned to teach the player about the nature, causesand possible resolution of the Israeli–Palestinianconflict, and has seen considerable success in its aim

(Burak, Keylor, & Sweeney, 2005). Our sample wasdivided into two conditions: in the second condition theparticipants went through exactly the same protocol asthe first, but at the half-way point they also participatedin a reflective group discussion facilitated by theexperimenter. The condition 2 players were collocatedfor the discursive reflection, but at all other times werevisually and aurally isolated from each other. Condi-tion 1 was an entirely solo protocol.

Contrary to our expectations, the experiment showsthat those who did not have a reflection period achievedbetter learning results [where learning is demonstratedat the Bloom 1–5 level degree of difficulty (Anderson,Krathwohl, & Bloom, 2001)]. A closer look at theseresults, however, shows that the counter-intuitive out-come is attributable to the type of learning which ismeasured, and how it relates to the recursive interac-tion between reflection period and game play. Thissuggests that more critical analysis is needed in thedesign of serious games, as it is non-trivial to obtain theideal pedagogical outcome.

Background, state of the art

A game, at heart, involves the resolution of playeruncertainty via novel information processing (Salen &Zimmerman, 2004), which is an inherently pleasurableexperience (Biederman & Vessel, 2006) when it isdivorced from significant consequences. This formula-tion would suggest that the playing of games lendsitself well to enjoyable learning – and indeed somebelieve that forms of learning are almost always part ofplay (Koster, 2005; Sutton-Smith, 1997). Games gen-erally involve skills (even games of chance can beplayed more or less skilfully by odds recognition) andKoster (2005), among others, claims that building rep-ertoires of nested skills is the heart of game-play pro-gression. If a game is built on the right kind of skills,they may transfer beyond the game to other contexts,and it is known that skill learning literally has a trans-formative effect on the player (Scholz, Klein, Behrens,& Johansen-Berg, 2009). Thus, at the strategic level,the serious game player stands to gain a lot, if it can bedelivered.

While it has been debated whether or not educationalgames work (Egenfeldt-Nielsen, 2006; Gee, 2006;Wong et al., 2007), recent positive results (Blunt, 2009;Ritterfeld, 2009) suggest that a better question is: how

Learning and reflection in serious games 349

© 2013 John Wiley & Sons Ltd

will a given game work? Will a particular game teachretained, transferable skills that are the ones intendedby the designers, or will it teach skills only valuablewithin the game context? For example, Blunt ibid.studied the effect on class grades of using managementsimulation-style games versus the standard curriculum,basing comparative analysis on standardized coursetests (with more than 1000 students). Guillén-Nietoand Aleson-Carbonell (2012) used a pre- and post-testanalysis to examine the outcome of a language profi-ciency game for 50 students. Our design draws on bothapproaches.

The learning principles that naturally occur in gamesare covered comprehensively by Gee (2003). Funda-mentally, though games facilitate learning, the lessonlearned may be well hidden even to the game designer(Cook, 2006), if they are not explicitly aware of boththe design of the play experience and also the peda-gogy. McGinnis, Bustard, Black, and Charles (2008)draw a parallel between the factors describing studentengagement and those involved in game play. However,learning does not necessarily follow engagement, asargued by Kirschner, Sweller, and Clark (2006) whopoint out that discovery, problem-based, experientialand enquiry-based techniques are the main tools ofgames, but all require prior knowledge on the part ofthe student to evoke learning. A solution is soughtin scaffolding the game, that is, instructional supportduring learning to reduce cognitive load (O’Neil et al.,2005).

One approach is described by de Freitas (2006,p. 37):

In order for learning outcomes to be achieved it isnecessary with simulations (and games) to reinforcelearning that has taken place through meta-reflectionand post exercise consideration. This may be donethrough replaying the simulation, discussion, and dedi-cated activities that aim to highlight key aspects of thelearning.

Many learning games successfully feature reflection,such as the science-education game River City, whichKetelhut (2006) examined in a correlational analysisof self-reported self-efficacy versus in-game investiga-tive behaviours for 96 students. Similarly in QuestAtlantis, on which Barab, Gresalfi, and Ingram-Goble(2010) reported in case study format, it seems to bean important determinant of their efficacy. However,in our target domain, it is an open question how this

discussion and reflection should be implementedalongside game-play features to help players learn, inorder to maximize effects for minimal cost.

The importance of reflection is widely understoodamong teaching professionals and educational scien-tists (Boud, Keogh, & Walker, 1985; Ewell, 1997). Inthe touchstones of the literature, Kolb’s (1984) workon, for example, experiential learning (or EBL) and hislearning cycle brought together a number of significantstrains of thought. These include the learning modelsof Lewin and Dewey as well as the model of learningand cognitive development of Piaget (Kolb, 1984;Miettinen, 2000). The family of EBL, which is perhapsmore closely associated to adult education (Miettinen,2000), has also been influenced by a number of peda-gogies developed in the context of childhood educa-tion, for example, the pedagogies of Montessori, Hahn,Neill and Freire (Andresen, Boud, & Cohen, 1995;Freire, 1974). This shows that the concepts of experi-ence and reflection enjoy a position of importanceamong a very wide array of educational theorists andpractitioners.

The concepts of experience and reflection, on theother hand, have also continued to live on in cognitiveconstructivism as well as in social constructivism.Cognitive constructivism, which has been greatlyimpacted by the work of Piaget, sees that learners ‘con-struct’ their own knowledge base through experience:experiences guide the creation of schemas – mentalmodels – which are modified and developed throughexperience-related reflection, for example, throughassimilation and accommodation. Social constructiv-ism, on the other hand, emphasizes the role of culture,language and environment, including other people, inlearning and development (McMahon, 1997; Vygotsky& Cole, 1978).

The educational design inherent in our approach wasderived from the amalgamation of lessons learned inseveral prior studies. Zapata-Rivera and Greer (2003)found that student reflection was strongest when theywere revisiting concepts in discussion with a teacheror peer. In our discussion sessions, the moderator’s useof topic guidance was designed to scaffold the players’problem-solving approach by refocusing them onsalient aspects of the game as recommended in O’Neilet al. (2005). Finally, the corporate target domain ledus to the particular protocol design used, where exten-sive post-game reflection would be less desirable than

B. Cowley et al.350

© 2013 John Wiley & Sons Ltd

in the student context targeted by games such asRiver City and Quest Atlantis, since it must occur oncompany time.

In summary, we employ a type of serious gamewhich is appropriate to the application domain ofcorporate education. As TARGET is intended to sup-plement, not replace, the traditional methods of com-petence training and because the topic context is deepand complex, we consider the serious game as playingthe role of a ‘primer’ to the topic. This theoreticalstatus of the game shaped the design of the learningprotocol for our study, and it is in relation to improv-ing the game’s function in this role that we proposedour two conditions. Because this type of game por-trays a real-life situation and offers the possibility tolearn from life-inspired experiences within the safetyof a virtual world, experiential learning is a suitabletheoretical basis for the research question of thisstudy. In addition to experiential learning explicitlyprescribing reflection, it has also been proposed as akey aspect of constructivist learning environments(Jonassen, 1994). The theories presented assume thatreflection is a critical part of learning from an experi-ence. For this reason, the goal of this study was toempirically test whether a facilitated group discussionis able to function as the kind of reflection thatinduces or facilitates learning, when embedded ina short-duration ‘primer’ style lesson mediated by aserious game.

In the rest of this article, we first describe in Section2 the study methodology, including subsections aboutthe game and its educational meaning, the assessmentquestions and the experiment procedure. Section 3details our results and Section 4 discusses them;Section 5 gives conclusions and future directions.

Materials and methods

We can state the primary research question as Hypoth-esis 1, H1a: that compared with playing the game alone,playing with a reflection period/group discussionwould predict improved learning scores. The secondaryhypothesis, H1b, implies that playing alone wouldproduce better learning scores than playing with dis-cursive reflection. This research question requiredthe choice of a suitable learning game and assessmentmethod, and a protocol to incorporate the discursivereflection with participants’ game play.

Peacemaker game

In order to select a suitable proxy game for our study,ten diverse serious games were shortlisted (ScienceSupremo, Thinking Worlds: Sims, DoomEd, MakingHistory, Colobots, Typing for Racing, IBM Innov8game, Peacemaker, A Force More Powerful, façade).The games were chosen for their suitability: they had tomodel a topic relevant to the aims of TARGET, such asconflict negotiation or resource management in realis-tic scenarios. The interaction mechanics needed to besimple and intuitive yet rich to facilitate learning in ashort duration, such as might be de rigueur in a corpo-rate training environment. Pragmatically, the gamesneeded to be instrumented so all player actions couldbe logged.

From this list, the Peacemaker game was chosen bythree independent assessors using six criteria to evalu-ate suitability (content complexity, simple controls,relevance to target, no Nash equilibrium, no biasingcontent, surety of learning outcome). The first andsecond criteria were weighted (¥3 and ¥2, respectively)due to their importance to the particulars of the experi-ment – namely that players had to be able to learn fromthe game, and that learning had be assessable.

Peacemaker is a point-and-click strategy game,where the player acts as Israeli or Palestinian leaderand must choose how to react to the (deteriorating)situation, deploying more or less peaceful options fromdiplomacy and cultural outreach to police and militaryintervention. For a thorough study on the interactioneffects between psychosocial personalities of playersand their performance in Peacemaker, see Gonzalezand Czlonka (2010). The learning aspect is embodiedin making the correct decisions to balance the interestsof stakeholders/factions who are initially antagonisticto the player and each other – in other words, to ‘escapethe zero-sum game’. Should any particular faction beoffended by the player actions, the compound approvalrating of either the Israeli or Palestinian sides will drop.If redress measures are not taken, the rating may dropfaster and threaten failure. In addition, random eventsoccur which affect the approval ratings, and highlightthe contextual nature of the learner’s decisions even ina scenario where they are the only human actor. Thus,players are expected to learn a new and more subtleperspective on the Israel–Palestine situation, as well asinsights into the requirements of stakeholder manage-

Learning and reflection in serious games 351

© 2013 John Wiley & Sons Ltd

ment in a potential conflict scenario, and the capacityfor dynamic decision making (Gonzalez & Czlonka,2010). The Peacemaker game’s strength as a learningtool has caused it to be internationally used.1 Thus, thefit to the Target requirements is good: Peacemaker maybe played in a short duration without extensive pre-training, and imparts valuable insights into conflictresolution even in a short duration.

An important aspect of the game choice and prepa-ration was the assessment of learning – deriving thequestions that were put to the subject. The questionshad to be answered pre-game and so they could notreference too specifically the content in the game, butmust be answered again post-game and so also be ableto elicit the subject’s learning of the topic representedby that content. The questions also need to address allthe (Bloom) levels of learning which the game providesscope for.

Questions

Questions had to assess learning at several levels and inline with the theoretical approach to learning alreadyestablished within the Target project. The questionswere generated by mining the content of the game,assigning Bloom taxonomy (Anderson et al., 2001)levels based on complexity of interactions betweencontent in the question itself and the acceptableanswers to the question. For instance, first-order inter-actions exist between a question such as ‘What is thereligious capital of Israel?’ and the answer ‘Jerusalem’,which would place this question at the first Bloomlevel. The Bloom levels describe the difficulty of attain-ing a particular level of learning, the levels themselvesbeing represented (in Bloom’s system) by descriptionsof the kinds of content one would produce to showattainment of such learning. The format of questionsis fully described in Appendix I (Table 1), which also

contains question samples and the assessment proto-cols to derive learning scores.

Protocol

The experimental protocol tests our hypotheses witha between-participants two-condition design (furtherillustrated in Appendix II, Figure 2). The second con-dition tests the effect of a reflection/guided discussionperiod and first is the control. Participants first took aquestionnaire to assess their knowledge of the topicarea, then engaged in two game play trials, and afterplay retook the same questionnaire.

We enrolled a random sample of 45 participants (29male), derived from subscribers to a number of univer-sity mailing lists especially relevant to the topics ofproject management (business students) and conflictresolution (psychology students). All participants gavetheir informed consent after a two-stage briefing (bye-mail and in person) which allowed opportunitiesfor questions for the experimenters. Also, backgroundinformation was obtained for participants’ age, gender,ethnic background, language knowledge, education,religious view and computer-game playing frequency.Personal connections to Israel or Palestine and priorknowledge of the subject matter were used as exclusioncriteria to prevent bias in the learning process. Partici-pants were randomly divided into two conditions, 20 incondition 1 (13 male) to 25 in condition 2 (16 male), toachieve a specific-effects randomized controlled trial(RCT) (the condition samples were of unequal sizebecause data were being gathered to address additionalresearch questions not described here, and somecondition 2 data were corrupted). The age range ofall participants was from 19 to 32 years (M = 24.6,SD = 3.5). Age and gender was balanced between con-ditions with no significant difference (age: t(43) = 0.0,ns; gender: t(43) = -0.68, ns).

Table 1. List of Topics and Questions Used During the Discussion and Reflection Period in Condition 2

Topic Question

Game strategy What did you find difficult about playing the game?Did you fail and have to restart the game, and how did you change your play style then?

Negative impressions Describe anything you found negative or unlikable about the content of this game?Surprising impressions Describe anything you found surprising about the content of this game?Emotional reaction How did you react to the game content emotionally? Was it positive or negative?Game versus reality What insights did you gain into the different points of view on the conflict?

B. Cowley et al.352

© 2013 John Wiley & Sons Ltd

The experiment procedure was divided into fivephases. First, the participants answered 41 questionsconcerning the Israel–Palestine crisis which tookaround 1 h (M = 56.0 min, sd = 17.3 min) – a timethat did not significantly vary between conditions(t(43) = 0.85, ns). Second, the participants were seatedat computers, visually and aurally isolated from oneanother in condition 2, and played a game tutorial andthe first of two 30-min gaming sessions. The two gamesessions were broken by phase 3. For condition 1, thisconsisted only of answering two quick experien-tial self-report questionnaires, the Self-AssessmentManikin (SAM; Bradley & Lang, 1994) consistingof three items and the Perceived Competence Survey(PCS; Guay, Ratelle, & Chanal, 2008) consisting offour items. Condition 2 differed from condition 1 bythe presence of a reflection period during phase 3: theplayers were brought into a group to participate in aguided discourse reflecting on their game experience,in addition to completing the self-report items. In phase4, the second game session was played. The fifth andfinal phase of the experiment was to answer the 41questions a second time, taking on average 34.4 min(sd = 12.7 min) again without significant difference intime taken (t(43) = 0.95, ns). Also, in this phase theparticipants were given a quick one-on-one debriefingto assess their subjective view of the experience ofplaying and learning based on a fixed set of questionsasked by the lead author.

The discursive reflection period, where participantswere asked to think about and discuss their play expe-rience thus far, was used as an independent variable(IV) against which the ‘learning score’ of participantswas compared in a between-participants design.

The five phases were designed to provide themaximum potential interaction between game and dis-cussion. The tutorial and game play in phase 2 providedthe participants in condition 2 with a topic for discus-sion, and the phase 4 game-play session allowed thediscussion to have an impact on play, and consequently,on their learning from the game. This impact would bemeasured both by the players’ game scores and by theirpost-game assessment scores. The reflection period incondition 2 was also internally structured to supportthis interaction: the lead experimenter brought the twoor three participants together, and used similar formsof wording (as below, Table 1) to raise the followingtopics for comments or discussion. Although the

wording in Table 1 relates often to game content, inpractice this meant more than just the game mechanics– the aim was to promote reflection on the genuine andpotentially affective simulated elements such as sectar-ian violence and oppression.

Each topic was proposed and each participantencouraged to respond – they were also encouraged toopen a free-form discussion among themselves (ratherthan simply answering to the experimenter), but thisonly arose naturally. If the discussion of a topic con-cluded then the next topic was raised. The content ofthe reflection period discussions was recorded by theexperimenter taking shorthand notes.

As mentioned in the introduction, there was animpetus to minimize the total time of the learning exer-cises (to mirror the real-life logistical restrictions ofindustry), which was a primary reason for the shortduration of each phase. Nevertheless, we ensured to usea total playing time, which was cohesive with that usedin other Peacemaker studies (Gonzalez & Czlonka,2010), where reasonable learning results were reported.

It is of note that the time difference between condi-tions (condition 2 lasted longer by design) did notsignificantly predict the learning scores, as reported inSection 3. From this, we may surmise that the learningeffects in both conditions only differed by the presenceof the discursive reflection, and thus the experimentwas controlled for nonspecific effects.

It is of note that the design includes no delayedlearning test to measure retention. Although a retentiontest might be desirable, it was forborne in this studydue to the issue of sensitization. That is, once the testsubjects have had a rather intensive experience with thetopic of the experiment, it is very likely that they wouldbe sensitized to the topic in a way that would makethem more likely to acquire more knowledge on thetopic, for example, through conversations and themedia. After a period of, for example, 1 month it mayalready be difficult for the test subjects and researchersalike to identify the source of the knowledge oropinion.

Ancillary data

In addition to game log data and the questions assess-ing learning, ancillary pieces of subject data wererecorded to help explain additional variance, primarilyrelated to player experience and prior biases.

Learning and reflection in serious games 353

© 2013 John Wiley & Sons Ltd

All such data were grouped thematically to facilitateanalysis – the groups were timing, motivation, gameexperience and background. The duration of eachprotocol phase, including reflection, was recorded; inaddition, time spent in the tutorial and percentagechange in time taken to answer the assessment ques-tions. The motivation group included the experienceself-reports taken after each game session, consist-ing of the SAM and the PCS. After phase 4, we usedthe short version of Game Experience Questionnaire(iGEQ) (Ijsselsteijn, de Kort, & Poels, no date), theseven factors from which were put in the game expe-rience group. Finally, the background group includedage, gender, education, ethnic background, languages,religious views and connection to conflict factors. Allancillary variables were tested for independence fromthe factor condition using one-way analysis of variance(ANOVA). Tests of normality using Shapiro–Wilkwere also performed. For more details, see AppendixII.

Finally, assessment of the end-of-experiment inter-view notes followed a thematic analysis approach, withfive themes: general experience with games, playing tothe mechanics of the game, paying attention to theserious content of the game, being interested in thegame, and feeling that they had learned something.‘Playing to mechanics’ means that the subject focusedmore on ‘beating’ the game, obtaining good score orotherwise getting caught up in the formal characteris-tics of the game, than on learning about the importanttopic of the didactical content. These themes werescored as either: mentioned positively = 1, mentionednegatively = -1, or not mentioned = 0. Feeling ofhaving learned something was further divided into twopositive levels as some participants said they ‘learned abit’ = 1, while some said they ‘learned a lot’ = 2.

Assessment methods

Assessment strategies were developed for each Bloomlevel (described here and see Appendix I for moredetails). It is of note that assessing the set of questionsgave two separate sets of result – one for Bloom levels1–5, and one for Bloom level 6: the open questions.

For the first 5 Bloom levels we derived a ‘correct’answer from the game documentation and data miningof empirical records (logs) of games played – that is,a ‘truth’ value in relation to each question was estab-

lished by studying what the game had shown theplayers. Using these answers, we assessed fixed-choiceresponses by scoring the difference between the sub-ject’s first and second response with respect to howmuch more accurate (or inaccurate) they became, thatis, gain scores. Normalized gain scores were consid-ered a non-prejudicial approach with high flexibility, inthat the gain could be readily transformed for weight-ing or data exploration, as advised by Lord, French,and Crow (2009, p. 22) (for more on the assessmentprotocols, see Appendix I).

Open questions required a more qualitativeapproach, whose final quantification was not directlycomparable (on an interval scale) to the level 1–5 ques-tions. Instead of basing assessment on empirical analy-sis of game data, these responses required a seriesof guided judgements, and we maintained consistencyand impartiality by using detailed criteria applied bytwo separately working assessors adjudicated by a third(see Appendix I).

Unfortunately, the level 6 results did not have veryhigh variance, because many participants could not beconsidered to have demonstrated this high level oflearning with their answers (and so their assessed scorewas 0). The inter-rater reliability for the six level 6questions was also not good – the question ‘What isyour current understanding of the causes for the Israeli-Palestine conflict’ had Cohen’s kappa = 0.44, which isconsidered ‘moderate agreement’, but all others scoredless than 0.4, that is, poor to fair.

Before summation to a final learning score, gainscores for each question were weighted by the Bloomlevel rating of the question associated (giving moreweight to questions that theoretically indicated a higherlevel of learning) and then normalized. We tested per-mutations of the weighting options for each Bloomlevel, including linear versus exponential scaling, andthe latter option provided the most differentiated learn-ing outcomes [for more rationale on linear vs. non-linear weighting, see Gribble, Meyer, & Jones (2003,p. 26)].

Thus, our final dependent variable (DV) was theexponentially weighted normalized gain score, withparametrically increasing importance of Bloomlevels, which we will refer to as gainExp (overallM = 12.02, sd = 12.06, range = -13.56 to 41.59). ThegainExp distribution (split by condition) was normalby Kolmogorov–Smirnov test, where condition 1 was

B. Cowley et al.354

© 2013 John Wiley & Sons Ltd

D(20) = 0.12, ns, and condition 2 was D(25) = 0.08, ns.For the two conditions, the variances were equal byLevene’s test, F(1, 43) = 1.24, ns.

Results

The learning scores and associated variables are shownfor all participants in each condition in Table 2. Basedon these data, we derive an effect size for the trial ofCohen’s D = 2.7. Interestingly, the correlation betweenanswer quality and the percentage change in answeringtime from the first test to the second is negligible[Pearson gives r(45) = 0.1 for gainExp and r(45) = 0.2for open]. The percentage change in length of openanswers shows similar lack of straightforward relation-ship to open scores, r(45) = 0.1.

To test H1a and H1b we used SPSS general linearmodel univariate procedure, with gainExp score as aDV and condition as a fixed factor. This was our firstmodel. To fully explore the data gathered, we also builta series of similar models by including one of theancillary data groups at a time: timing, motivation,game experience and background. Using groups wasnecessary to avoid too many IVs in a model. For allmodels, condition was included as a fixed factor; thus,we had five models, one for each group and one withcondition alone. Our hypotheses are tested by all ofthese models, the difference between each being theexpected source of additional variance.

In fact, in all models the result for the factorcondition supports H1b and negates H1a. For the firstmodel without covariates, condition was significantwith F(1,43) = 8.1, p < .01, h2 = .16. For all other models,condition was still significant, p < .01, but the effect

was strongest with the timing group, F(1,38) = 12.3,h2 = .24. This suggests that the within-groups differ-ences in the time that players spent in reflectiondiscussions, or took on the tutorial or answering ques-tions, helped explain the most additional variance(although the timing measures do not significantlypredict learning by themselves).

Because the direction of this result was unexpected,further analyses were conducted using the gainExpscore from the set of questions at each individualBloom level, for example, all level 1 questions only.Similar models were used without covariates. Strik-ingly, only level 5 was significantly different betweenconditions, F(1,43) = 9.6, p < .005, h2 = .18, while level 2was marginal, F(1,43) = 3.7, p < .1, h2 = .08.

The core relationship between condition and gainExplearning score is shown in Figure 1, illustrating how thereflection period resulted in diminished learning. Here,the ordinate shows the dependent measure of Bloomlevel 1–5 learning scores. The abscissa shows the cov-ariate age, binned by cut points at mean and �1 standarddeviations to support better visualization.2 Thus, the boxplots are the ascending age groups of subjects plottedagainst their learning performance, and split (coloured)by condition. In all, the figure shows how the condition1 subjects were generally getting higher learning scoresthan condition 2, but the effect is age dependent. Theinverse correlation between learning and age is alsomore pronounced for condition 1 (r(20) = -0.47) thancondition 2 (r(25) = -0.26).

As mentioned above, the assessment of open ques-tions (designed to test Bloom level 6 learning) did notreturn easily interpretable results. There was no signifi-cant difference between conditions for open questions

Table 2. Bloom Level 1–5 (gainExp) and Level 6 (Open) Learning Scores for All Participants in Conditions 1 and 2. Participants HaveBeen Ordered by Their gainExp Scores, and for Comparison We Also List the Percentage Change in Open Answer Length and Total Timeto Answer, From First to Second Test

Condition 1 Condition 2

GainExp Open

Open length% change

Total time toanswer % change

GainExp Open

Open length% change

Total time toanswer % change

m 17 0.6 73 58 8 0.55 79 66� 13 2 34 15 10 3 26 17~ x̂ 18 0 72 63 7 0 75 66max 42 9 161 84 27 10 129 100min -14 0 15 30 -7 0 34 33

Learning and reflection in serious games 355

© 2013 John Wiley & Sons Ltd

as shown in Figure 2, although the mean value forcondition 2 was less than condition 1.

Analysis of reflection period notes, although entirelyqualitative, indicates a general flow from discussionof the difficulties of the game toward more personalimpressions such as positive and negative emotionalreactions. Strategy and tactics served as a useful

opening topic, being a relatively ‘safe’ ground forparticipants to voice opinions and learn about theirfellow experiences – much of this was dominated byrecounting failures and the responsive strategies beingconsidered. Two-thirds of groups expressed surpriseat the complexity of the real situation as modeled bythe game; almost as many also mentioned that their

Age (Binned)

4321

gain

Exp

Lea

rnin

g Sc

ores

50

45

40

35

30

25

20

15

10

5

0

-5

-10

-15

-20

21

Condition

*

Figure 1 Box plot of gainExp LearningScores, Split by Condition and Distrib-uted over Four Levels of AgeThe continuous Age variable was binnedusing cut points defined by the mean and�1 standard deviations for the purposesof visualization. Maximum score is 38, asthere are 38 normalized fixed-formatquestions. The linear trend of each con-dition is for scores to decrease as ageincreases, although each condition alsocontains an exception to this (e.g., thegrey circle for condition 2, age bin 3 isan outlier who is clearly a top scorerin this condition). In each bin, we testedthe difference between conditions forgainExp scores, by univariate ANOVA,and only bin 2 was significant F(1,11) = 8.1,p < .05, h2 = .47.

Con

diti

on

2

1

Mean gainExp Learning Score

25.020.015.010.05.00.0

Mean Open Learning Scores

2.52.01.51.00.50.0

Error Bars: 95% CI

Figure 2 Mean Bloom Level 1–5(gainExp Shown as Textured Boxes) andLevel 6 (Open Shown as Circles Joined bya Vertical Line) Question Scores for Partici-pants Split by Condition. Error Bars AreShown as Shaded Bars for gainExpScores, and as T-bars for Open Scores

B. Cowley et al.356

© 2013 John Wiley & Sons Ltd

in-game failures gave them new perspective on thedifficulty of leading and of reconciling opposed parties.Reported positive and negative emotional reactions tocontent were quite minimal: one-third was sad or sym-pathetic at the Palestinian plight. In contrast, nearly allreported negative feelings of frustration at the difficultyof reconciling both sides, and especially at the opposi-tional reception of their peaceful initiatives.

Discussion

Based on the authors’ process to derive the assessmentquestions, we expect that scoring well on the fixed level1–5 questions would require close attention to themechanics of game play. On the other hand, scoringwell on open level 6 questions would require moreattention to the ‘big picture’ lessons implicit in thenarrative and setting. Based on this assumption and ourobservations, we have a causal interpretation of theresults.

The observed result shows the clear relationshipbetween condition and the Bloom level 1–5 learningscores, holding significant with any set of covariates.The specific divergence at level 5, as shown inFigure 3, places the emphasis on the strategies in thegame with their probable outcomes (also see AppendixI), and reflects on participants’ ability to evaluate themodes and motives of actors in the game’s scenario.

Level 5 questions relied primarily on two keyskills: recognizing and recalling facts important to thestrategic situations, and evaluating how individual factsrelate and affect each other. Thus, the result may beexplained by considering the effect on both workingmemory, and immersion in the game context, caused bythe reflection period occurring between game sessions.The priming mechanism of a game works by immers-ing players in a ‘semiotic domain’ (Gee, 2003). Bycontrast, the reflection period asked players to focuswithin a new context on what they had learned whileplaying the game, thus acting to reorient player atten-tion. Focus was shifted away from the context of thegame to the context of discussing the game. This wouldhave changed the content of their working memory,and affected their recall ability later. While the instruc-tional design was explicitly aimed at some kind ofreorientation, we can now see that this design has thepotential to diminish learning at some levels withoutdefinite benefits at other levels.

The structure of memory is a matter of ongo-ing debate (Gobet, 1998; Mogle, Lovett, Stawski, &Sliwinski, 2008), but it is clearly divided into two ormore parts: short-term memory, long-term memory,and an intermediate structure or process, workingmemory. They progress in capacity greatly, workingmemory being quite limited and so primarily affectedby the demands of immediate context. In addition tothe discussion taking over some ‘space’ in the workingmemory and ‘bumping’ out some (more irrelevant)Peacemaker information, we surmise that the discus-sion also acted as a selection function for whatwas discarded and what was kept as a candidate forlong-term memorizing. Thus, the conversational focus,whatever it was, diverted attention from purely system-atic lessons of the game, such as strategies of conflictnegotiation, which require a rather deep (uninter-rupted) concentration to internalize. Though mechanicswere discussed in all groups, the open format and inex-pert comments (somewhat) precluded deep examina-tions of the causes and motives at play. Therefore, thoseplayers without a strong sense of the simulated politicallandscape by the end of one session were not aided bycohorts to deeper insights. Indeed, they may have beendistracted from forming such insights.

Because individual players vary in their skill and lociof attention then the suggested reorientation shouldhave affected them in varying degrees. We see evidence

BloomLevel5BloomLevel4

BloomLevel3BloomLevel2

BloomLevel1

Blo

om 1

-5 g

ainE

xp M

ean

9.00

6.50

4.00

1.50

-1.00

-3.50

-6.00

Error B

ars: 95% C

I

21

Condition

* p < 0.1*** p < 0.005

****

Figure 3 Box Plot of gainExp Scores at the Individual BloomLevels 1–5, with 95% Confidence Interval Error Bars

Learning and reflection in serious games 357

© 2013 John Wiley & Sons Ltd

for this in the high standard deviation (mean of per-group sds = 9.6) of gainExp scores within the separatereflection groups. In each group, there is one partici-pant with much higher or lower scores than her twocompanions, suggesting that some players ‘led’ theirrespective discussions while some players ‘followed’.If this were not the case, we would expect the scores tocontain some exceptions to the pattern, but the patternholds: the lowest sd is >5, and the sd of sds is itselfvery low at 2.5. A few players also commented in endinterviews that game mechanics became more transpar-ent after discussion at the halfway point. Notes takenduring reflection support this, although participants didnot note this explicitly during discussions.

So it may be that the discussion acted as a processwhere the pieces of information for which there wouldbe no long term use were simply discarded. For thecondition 1 group it may be that a similar discardingprocess happened later, as a result of resuming normallife. To put it another way, the absence of the reflectionperiod in condition 1 allowed better concentrationon, and short-term retention of, the small details in thegame.

Of course, this interpretation does not cover all theoptions. It is also possible that the assessment question-naire may have contained too many items prone toletting players find the correct choice by comprehen-sive trial-and-error style playing. For instance, somequestions asked which game actions would bringpositive results (game score), and in theory, a player(playing to mechanics) could have discovered the bestcourse of action with no appreciation for the underly-ing principle which made the action successful. Aplayer who was encouraged by the group reflection toponder upon the underlying principles – that is, whycertain actions lead to success in the game – may haveplayed more slowly and thus had less chance to cor-rectly respond to all question items dealing with gameactions and their consequences.

Given the paucity of valid displays of level 6 learn-ing and subsequently low variance in the scores, com-bined with low inter-rater reliability, we cannot drawstrong conclusions from the open results. What we cansay is that because open scores were not significantlypredicted by condition and were of low magnitude, ourstudy indicates that deep learning is not probable witha short-length didacticism, in either condition. Inter-estingly, where level 6 learning did occur in condition

2, 66% was among members of the same discussiongroups; these high-performing groups also shared thesame reflection duration (17 min, also equal to themedian of all reflection durations). Open questionsalso helped indicate the strategic significance of thegame play for the participants. Post-test answers thatdemonstrated level 6 learning mostly shared a featureof pointing out more details with better accuracy,such that (for example) the average number of new orcorrected facts from first to second answer in openquestion 1 ‘What is your current understanding ofthe causes for the Israeli-Palestine conflict?’ was 2.6.While these responses certainly demonstrated therequired synthesis of observations from play, theimplication was that the only way to formulate suchanswers was by exploring the many details the gamehad to offer. This implies that this game has not over-come the need for careful attention and fact retentionin learning.

The final results explored are the relationshipsbetween the gainExp/open learning scores and the end-of-experiment interviews. Some of these reflected ininteresting ways on the gainExp results. An interestingresult arose from the theme ‘Played to mechanics’ –shown in Figure 4 plotted against gainExp and openscores. At Bloom level 6, playing style clearly divergesbetween conditions, condition 2 high scorers playingless to mechanics than condition 1 counterparts. Atlower Bloom levels conditions are in agreement. Thisgives support to the given interpretation of the observedcondition-predicts-gainExp result – concentrating onthe minutiae of the game play aided participants inlearning topics for the more narrowly focused Bloom1–5 questions, but only distracted when it came to the‘big picture’ questions.

Reporting having paid attention to the content gave aresult in a surprising direction. For condition 1 partici-pants, this theme correlated positively with open scores(r(20) = 0.23), but for condition 2 there was no realcorrelation (r(25) = -0.057). Thus, condition 1 partici-pants related to the game as one might expect, learningbetter on the fixed questions when they paid moreattention to the mechanics and learning better on theopen questions when they looked more at the content(there is not necessarily a mutual exclusion here). Yetfor condition 2, on open questions it didn’t seem tomatter if they paid attention to the game’s content, andplaying to mechanics was a negative behaviour. Thus,

B. Cowley et al.358

© 2013 John Wiley & Sons Ltd

learning on the open questions could be potentiallyattributable to the reflection session.

In summary, it seems quite likely that reorientationwas an underlying cause of condition 2 lower learningscores, but there are multiple possible structures ofcausal dependency between the IVs and DV. We haveconsidered four possible causal interpretations of ourresult. Scenario one is that the condition 2 participantscould not maintain the 1–5 level subject matter inworking memory throughout the ~20 min reflectionperiod. A second possible scenario is that the groupdiscussion altered their focus of attention in the game –during solo play (in either condition), the player mayform an opinion of what content or playing actions areimportant and focus on them. After participants shareda reflection period, their individual perspectives mayhave mutually aligned, leading them to refocus withrespect to game content and mechanics. This may havebeen unfavourable for achieving the kind of learningmeasured by the questionnaire: a strongly realignedplayer could have been forced to almost ‘begin again’their learning in game session 2.

The third scenario was that assessment was notwell suited to pedagogical design. The fourth final sce-nario: if particular playing styles support particularlearning styles (Bateman & Boon, 2005) so that forinstance those who ‘played to mechanics’ did better atlevel 1–5 questions, then post-reflection reorientationadjusted playing styles with a similar effect to thesecond scenario.

Any study of learning in games involves controll-ing a multi-modal, multidimensional experiment. Thisimplies that multiple studies to examine differentaspects of the learning experience would be neededbefore drawing firm conclusions. Our protocol did notbalance time and cognitive demand between condi-tions, suggesting an alternative experiment design totest the effect on learning when a distractor task isperformed between play sessions, of equal length to thediscursive reflection. Another option is an interstitialsession of game play (again of equal length). However,there are issues with each of these control options.The duration of the phases in our protocol is also quiteshort, as motivated by the target domain.

Nevertheless, despite these limitations and the smallsample size, our study was powerful enough to detect alarge effect: suggesting that multiple small RCTs, eachtightly focused on a specific combination of pedagogicdesign and gaming, would bear fruit.

A potential approach to these future studies would beto use a longitudinal design, with multiple playing ses-sions over some weeks/months, with a similar phasedprotocol to ours. However, this would require a carefulrecruitment of subjects to ensure compliance withoutselection bias. The protocol design could also be recon-sidered: as it is, with short playing phases around areflection period, it is not a meditative self-reflectionbut rather aims to follow de Freitas (2006, p. 37)‘replaying the simulation, discussion, and dedicatedactivities that aim to highlight key aspects of the

Open scores

543210-1

Mec

hani

cs

1

0

-1

21

Condition

1

2

3

4

5

6

7

Scale

a b

gainExp Learning Scores

50403020100-10-20

Mec

hani

cs

1

0

-1

2121

Condition

1: R 2 Linear =0.019

2: R 2 Linear =0.042

1: R 2 Linear =0.030

2: R 2 Linear =0.017

Figure 4 Plot of Interview Theme ‘Play to Mechanics’ Against Both Types of Learning Scores: gainExp in on the Left and Open Scoreson the Right

Learning and reflection in serious games 359

© 2013 John Wiley & Sons Ltd

learning’. This approach is tailored to the domain, butour experimental result indicates care must be takenin the design of support activities, be it reflection/discussion periods or otherwise.

While the limitations imply the need for corrobora-tive studies, we must look at the protocol to explain thecounter-intuitive result obtained. The general implica-tion of these results is that uninterrupted attention to,and enjoyment of, the game seems to be a prerequisitefor learning at the Bloom levels 1–5. For any proposedlearning game, if the nature of the pedagogy is suchthat levels 1–5 are important, as with simulation gamesfor learning basic skills, then it is important to considerour results in designing the game and its application.

Conclusions

We attempted to address the general question of howpedagogy can be structured in a serious game, withfocus on a mode typical to corporate training: a topic/content primer game with scaffolding by guideddiscursive reflection. Specifically, the question waswhether a guided group reflection period inserted intothe game play can improve learning scores. In fact, wefound that the result was to diminish learning scoresby comparison with a condition where there was onlyplay and no reflection. The learning test instrument wasassessed in two parts: fixed-format questions and openquestions. The significant result applies only to fixed-format questions mainly because many participants didnot display Bloom level 6 learning.

We do not suggest that reflection is unsuitable forserious game-based pedagogy; rather this result showsthat in the learning loop between play and reflection,the pedagogical elements will interact recursively andunintended effects may arise.

Acknowledgements

The authors would like to thank research assistantsLauri Janhunen, Siiri Peli and Svetlana Kirjanen inconducting this study, as well as Marco Rapino andSimo Jarvela for technical assistance. We would alsolike to warmly thank game producer Eric Brown for hishelp in using the Peacemaker game and plug-ins, andMichael Bedek for valuable input on drafts of thepaper.

This work has been funded by the EU projectTARGET (IST 231717). This publication reflects theviews only of the authors, and the Commission cannotbe held responsible for any use which may be made ofthe information contained therein.

Notes

1See, for instance, http://gaming.wikia.com/wiki/PeaceMaker_(video_game)

and also http://phe.rockefeller.edu/docs/PeresCenterPressRelase.pdf2As there is not a large loss of information by considering a population in terms

of age bands rather than individual years, binning the variable age as in

Figure 1 helps envision the changing mean gainExp score across ages.3Competing but equally correct answers are not what was initially listed in the

game documentation (which gave the original basis for forming the question),

but were proven to be equally valid by empirical means (mining the game log

files of participants).

Appendix ITable A1 outlines the relationship between types of questions, the Bloom level assigned to them, and the game dataor experience that the question addresses – it also lists the number of such questions asked. A sample list ofquestions and complete assessment protocols follow.

The following is a representative sample from the list of 41 questions (seven questions also had two parts), whichwere asked about the Peacemaker game, to assess what the player learned from playing as the Israeli prime ministerin the Peacemaker game. They are numbered by the order they appeared to participants. Every question tests adifferent form of learning and is related in the assumptions to the correspondingly numbered level within the Bloomtaxonomy: see Table A1 and Anderson et al. (2001). See Section 0 for details of the assessment protocols for eachquestion type.

B. Cowley et al.360

© 2013 John Wiley & Sons Ltd

Qualitative questions

These questions contain open-answer elements designed to enable assessment of the higher levels of subjectlearning.

1 What is your current understanding of the causes for the Israeli–Palestine conflict?– BLOOM 6

4 Give your current estimate of the possibility of future peace in the region. Indicate the probability using a numberfrom 1 to 9, with 1 being impossible and 9 being absolutely certain.– BLOOM 6

Highly improbable 2 3 4 Unknowable 6 7 8 Highly probablePeace

i) Describe why you picked that number.

Quantitative questions

Below is a list of sample quantitative questions. The answer to the question is listed directly below each.Following are the assumptions behind the question – these include any assumption supporting the validity of theanswer plus the necessary condition for the question to work in the experiment, that is, how the player learns theinformation.

Table A1. Relationship Between Types of Question, their Bloom Level, and the Game. Final Column Is the Number of Questions of ThatLevel That the Player Had to Answer (the Number in Parentheses Shows How Many Separate Items Were Asked in Multi-Part RatingQuestions)

Type of questionBloomlevel Game data covered Num

Recall/multiple choice questions 1 or 2 Geography, groups and leaders, polls, tutorial, timeline, introvideos

28

Application/active recall 3 The effect of actions available to the player:Security: retaliation, suppression, public order, prisonersPolitics: international, domestic, cabinet, Palestinian authority,

workers, tradeConstruction: aid (medical, infrastructure, civil), domestic,

settlements, wall, refugees

5

Analysis of given data, requiringinference or induction

4 Sequences of actions where the outcome can be defined withcertainty, given the particular state/stage of the game. Whichactions do or don’t work together.

4 (28)

Evaluation or judgement of givengame scenarios

5 Strategies in the game with their probable outcomes. 4

Synthesis of game data, or use ofgame scenario, to relate realworld situation

6 Personal expressions/descriptions of topical knowledge/lessonsfrom game experience.

6

Learning and reflection in serious games 361

© 2013 John Wiley & Sons Ltd

10 ‘A high level of trust and mutual respect between the Israeli PM and Palestinian Authority President is vitalto successfully build peace.’Rate the following actions/policies by the Israeli PM for their effectiveness in building trust.a) Security: securing/patrolling areas with police 1 2 3 4 5b) Politics: political speech making 1 2 3 4 5c) Construction: expand Israeli settlements 1 2 3 4 5d) Security: engage and destroy militant targets 1 2 3 4 5e) Construction: authorize aid packages to the Palestinians 1 2 3 4 5f) Politics: push for improved commercial and cultural relations 1 2 3 4 5g) Security: restrict and curtail Palestinian population 1 2 3 4 5h) Politics: negotiation and consultation between leaders 1 2 3 4 5i) Construction: Israeli domestic welfare initiatives 1 2 3 4 5answer) Ratings given in two answers (x, y) are evaluated by (y - x)*w, where w is given below:a = -1, b = 1, c = -1, d = -1, e = 1, f = 1, g = -1, h = 1, i = -1These weights were derived from the responses of the AI to actions corresponding to those named, in the gamesplayed by test participants.We have estimated as follows (this is only a guess):a = 3, b = 4, c = 1, d = 2, e = 3, f = 5, g = 1, h = 4, i = 2.assumptions) We assume the correctness of the answer based on observation/play. Player can infer fromobserving relevant variables while trying this strategy. – BLOOM 411 Which of the following regional countries share a border with the state of Israel?Cyprus [no]Egypt [yes]Iraq [no]Jordan [yes]Lebanon [yes]Saudi Arabia [no]Syria [yes]Turkey [no]Yemen [no]answer) Egypt, Jordan, Lebanon, Syriaassumptions) Player observes large-scale map in game, does not know relevant geography – BLOOM 113 Name two fears of the Israeli public.answer) Palestinian militant attacks, instability, economic lossassumptions) Player reads about group – BLOOM 214 Below are four examples of a sequence of actions/policies that can be done by the Israeli Prime Minister.Tick each sequence if you believe that the final effect (of all the actions in it) would be to please both the Israeliand Palestinian sides and overcome the zero-sum effect.In the Israeli–Palestine conflict, as in the game, it is often the case that a particular action or policy by a leaderwill be disapproved by one side as much as it is approved by the other. This is known as the zero-sum effect.Now, rate each sequence for how well it would please both sides at the same time.The score indicates how pleased both sides are after all the actions in the sequence are done. So a score of 1counts as ‘really displeases one or both sides’ and score of 5 counts as ‘really pleases both sides’.a) 1 2 3 4 5Increase the number of police patrolsOrder the Israeli Army to destroy militant infrastructureEngage in political talks with the Palestinian President to demand anti-militant action

B. Cowley et al.362

© 2013 John Wiley & Sons Ltd

Give direct funding to construct Palestinian medical institutionsb) 1 2 3 4 5Order the removal of existing Jewish settlementsEnact security policy to reduce Palestinian curfewsEnact policy to decrease trade restrictionsEnact a domestic economic stimulus packagec) 1 2 3 4 5Engage in political talks with the Palestinian President to ask for anti-militant supportIncrease the number of police patrolsEnact policy to decrease trade restrictionsOrder the Israeli Army to secure an aread) 1 2 3 4 5Enact a policy to allow Palestinian refugees to immigrate back to their homelandEnact an initiative for more cross-cultural projectsThrough the UN, give funding for Palestinian medical institutionsMake a speech to the Israeli people calling for cooperation with Palestiniansanswer) Where scores are written (Israeli approval, Palestinian approval), test-playing the above actionsequences (from the first move of the game) achieved the following scores:a = (15, 25) b = (-6, 23) c = (8, 5) d = (-5, 5)/(-4, 5)/(-3, 10)Rating each action individually across all played games, and averaging, gave scores of:a = (4.69, 3.81) b = (-0, 8, 8.56) c = (3.77, 1.74) d = (-0.29, 3.25)Only c) should be ticked, and the ratings given (x, y) are evaluated by (x - y)*w, where w is given below:a = -1 b = -1 c = 1 d = -1assumptions) We assume the correctness of the answer based on observation/play. Each strategy was testedthree times. Player can infer from observing relevant variables while playing. – BLOOM 519 Of all the interested parties (represented in the game as groups and leaders), ____________ are mostopposed to your plans (i.e., have the lowest approval of you in the game).answer) Militants – name of any one should suffice, for example, Hamasassumptions) Player notes the approval ratings at beginning of game – BLOOM 232 What is the general feeling among the Israeli public toward the Palestinian public (as simulated at thebeginning of the game)?a) Apathy b) Anger c) Despair d) Compassionanswer) a)assumptions) Player sees the event reporting this info, or infers from Israeli sympathy poll. – BLOOM 133 What could be done to improve this? Tick any that apply.a) Focus on securing of borders and segregationb) Focus on diplomatic relations with foreign countriesc) Focus on intercultural initiatives and traded) It is impossiblee) Focus on Israeli settler issuesf) Focus on domestic education and welfareg) Focus on destruction of militantsh) Focus on green issuesanswer) a) and c), and to a lesser degree f)assumptions) Player infers from Israeli sympathy poll after playing correct strategy. – BLOOM 3This question should only be scored if Q32 was answered correctly.

Learning and reflection in serious games 363

© 2013 John Wiley & Sons Ltd

Assessment protocol

A theoretical issue for assessment is the assumption taken regarding negative learning – whether or not theparticipant is capable of showing a decrease in overall knowledge about the subject area over the period of theexperiment (as judged by their gain score). Rationally, this should only happen if the game-play protocol teachesthem wrong information, but because the questions are derived from, and the answers judged against the game’scontent, this should not be possible. However, considerations of player non-rationality mean we must include thepossibility of ‘negative’ learning (it is clear that they could, in principle, simply misremember information learnedin play by the time they take the post-test). Gain scores are thus potentially negative – if answers go from right towrong they are given negative points. However, this ‘negative learning’ score can be treated as zero in post-processing, achieving the same effect as an initial assumption of no negative learning in an exploratory analysis.

For questions (of levels 1–5) that requested specific information but allowed open answers (free text input) wedefined a synonymy set, that is, a set of answers that could legitimately be given in lieu of the ‘correct’ answers.Rating questions were assessed by a formula that preserved the magnitude of the subject’s response preferencewithout giving an arbitrary ‘truth’ value to the rating item. All level 1–5 questions thus obtained a gain score. Thesewere then weighted. Initially, weights were the product of the gain score and the number of the Bloom level, whichgives a linear increase in importance over Bloom levels. Yet the ‘learning value’ of the Bloom levels is not definedin a scalar sense, only as ordinals, so there is more than one option supported by theory for weighting each level.For instance, the importance of learning at higher levels could be considered parametrically greater than lowerlevels (because mastery at each level is considered to require mastery at all the lower levels first): applying thischanges the weight values from linear scaling [1,2,3,4,5,6] into exponential scaling [1,2,4,8,16,32].

• For single response multiple choice (e.g., Q32)� 1st and 2nd response are the same = 0 points� 2nd response is correct and 1st response is not = 1 point.� 1st response is correct and 2nd response is not = -1 point.

• For multi-response multiple choice (e.g., Q11)� For every response that is the same both times = 0 points� For every correct response in 2nd answer (that is not in 1st answer) = 1 point.� Every correct response in 1st answer (that is not in 2nd answer) = -1 point.� Every incorrect response in 1st answer (that is not in 2nd answer) = 1 point.� Every incorrect response in 2nd answer (that is not in 1st answer) = -1 point.

• For single answer ‘open’ questions (e.g., Q19) – the right answer, or a synonym, or a competing but equallycorrect answer,3 is in 2nd response but not in 1st response = 1 point.

• For multi-answer ‘open’ questions (e.g., Q13) – every correct answer, or a synonym, or a competing but equallycorrect answer,3 in 2nd response that is not in 1st response = 1 point.

• Rating questions (e.g., Q10) are assessed by a formula detailed in the question above. The procedure works asfollows (please refer to question 10 above, when reading below):� So, for example, in this one rating-type question we have these 9 items, with a weight attached (either -1, 0,

1) which was derived from the data of game players by asking, for each rating item, what was the reaction inthe variable of interest after the action that is cited in the rating item (in question 10 the variable of interestis the relationship between Israeli and Palestinian leaders, defined by a scalar in the game).Thus we do not pre-judge what score the rating should be, but rather only whether the action associated with therating was positive, negative or neutral (with respect to the question asked). This is defined by our weights w.By subtracting the first score from the second, we get a magnitude and a sign. Say in item 10.a (with weight -1)the subject responds first with 4, second with 2. Then the calculation would be

2 4 1 2−( ) − =*

B. Cowley et al.364

© 2013 John Wiley & Sons Ltd

The subject has downgraded his rating of that action (which was defined as a bad action for the purpose ofbuilding trust, based on the data), from more positive (4) to more negative (2), so his score is +2, preserving themagnitude of the change. If he had answered in the opposite way, first 2 and second 4, he would be upgrading hisestimate of the quality of the (bad) action, and thus would get a score of

4 2 1 2−( ) − = −*

Thus, we preserve magnitude without giving an ad hoc ‘true’ value to the rating item.

• The procedure for assessing open questions is detailed in the next section.

Open question assessment

From the 41 questions, 6 were open questions of the form: ‘What is your understanding of [a topic]?’ or ‘Describewhy you [responded to the antecedent quantitative question as they did]?’ These open questions were analysedseparately because they were not held to be immediately comparable to the quantitative questions in terms ofscoring. They represented opportunity for wider contemplation when answering and thus enabled responses thatmight (or might not) be evaluated as containing Bloom’s ‘level 6’ knowledge.

Supplying open questions and evaluating them separately was aimed at investigating whether being asked to reflecton the game play would improve and deepen understanding of the topic, especially in a social group where the ‘socialnorm’ would help to prevent the participants from reflecting off-topic (daydreaming). The evidence was expected topresent in the difference between the quality of first and second round responses to the six open questions.

This difference was made visible by applying evidential criteria to assessing the match between pre- andpost-game answers and ‘level 6’ of Bloom’s taxonomy. In other words, each of the six criteria below was appliedto the material at hand, once for the first answer and once for the second, and the analyses compared.

1 Indications of finding central factors or components of a phenomenon, understanding how they are related to oneanother, can compare phenomenon to other similar phenomena, ponders upon the differences, commonalitiesand reasons for differences.

2 Indications of being able to take into account different perspectives (Israel, Palestine, rest of the world).3 Indications of being able to approach the phenomena from different themes (economy, politics, religion, power,

history, culture . . .).4 Indications of being able to apply a principle derived from or presented in the game to another context

or indications of being able to import principles (e.g., to pieces of one’s experience and knowledge domain) fromone’s other contexts into the game thematic context.

5 Indications of being able to reflect upon the principles presented in the game and assessing their validity andapplicability in other contexts, evaluating the boundaries and terms of the applicability, for example, in relationto time, place.

6 Indications of self-reflection or questioning of one’s own beliefs.

All responses to 6 open questions in 45 participants were first filtered according to the simple constraintof meeting the Bloom level 6 description cited above. As predicted, only a small percentage of answers per questionsucceeded to meet the criteria in their content; most of the responses consisted of few sentences without indicationsof wider contemplation of the topic.

Those qualified responses were then evaluated by manual assessment, according to the six detailed criteriaabove, maintaining consistency and impartiality by using two separately working assessors following a fixedprotocol for each question. After this parallel reflection phase a concerted discussion was held in order to evaluatehow well and in what proportion did the qualified responses meet the criteria, which also determined the inter-raterreliability in addressing the criteria to the responses.

Learning and reflection in serious games 365

© 2013 John Wiley & Sons Ltd

Appendix IIAppendix II describes elements from the context ofthe experiment, useful for understanding the experi-ence of the participants, but not vital to the presentationof results.

The game

In Peacemaker (Figure B1), players act as a regionalleader (Israeli or Palestinian) and play is orientedaround strategic management of conflict, taking gov-ernmental actions as shown by the menu on the left.

Conflict is modeled by factions/stakeholders whoeach have approval ratings for the player – informationcan be obtained by clicking on a faction’s icon. ‘Spon-taneous’ events are reported as news (marked on thescreenshot by reticules), drive the game narrative, andas player approval ratings with a particular faction varythese events become more or less critical (in the screen-shot crisis is indicated by the colour of the reticule).Events and player actions combine to drive approvalratings – winning is defined as achieving 100/100 onboth Israeli and Palestinian ratings (see bottom left),while losing happens after scoring -50/100 on either.

The protocol

The protocol described in the paper is illustrated belowin Figure B2.

Detailed data descriptions

The full list of ancillary data gathered in the experimentis given in Table B1 below. Such background data mayhelp differentiate between players by aptitude withinthe protocol because the activity of game playing canbe easier for those who regularly play games, even ifthey are different games; while the content may bemore familiar to those with a connection to the regionor cultures therein. Subjective experience of the gameplay session is also important because learning and taskperformance may be affected by engagement andmood.

All these variables were tested for independencefrom the factor condition using one-way ANOVA –only Connection to Conflict was rejected due to thistesting. Tests of normality using Shapiro-Wilk werealso performed for each variable (split by condition),the result of which determined how we could analysethe variable: as a factor or covariate in the models withcondition as IV, and as predictor of gainExp or not. Thestatus of the variable is given in the column ‘SW’,where N = non-normal (for both condition groups) andY = normal (for both condition groups).

For self-reports that were taken twice, once aftereach game-play session (SAM and PCS), two modesof combining the two separate scores were used inthe final. The difference score shows the change inparticipant attitude between first and second game play

Figure B1 Screenshot of the PeacemakerGame

B. Cowley et al.366

© 2013 John Wiley & Sons Ltd

sessions. The sum of scores helps show the overallattitude of the participant. Gathering and analysingsuch factors and covariates helps explain additionalvariance between participants, not attributable to thevariation between conditions.

Additional results

Some results were obtained from analysis that did notdramatically affect the picture described by the mainfinding. However, they are worth examining forcontext.

Based on the additional models run with covariategroups, a significant prediction was found for age,F(1,38) = 7.5, p < .01, h2 = .16, and marginally signifi-cant for challenge (from the iGEQ instrument),F(1,36) = 3.4, p < .1, h2 = .09.

This result from age does not interact with condi-tion and is difficult to interpret, because the only vari-able that significantly predicts age is education (byANOVA, F(1,39) = 8.7 p < .01), for an obvious reason(higher age correlates with more education). Yet, for

the same reason one might expect the relationshipbetween age and learning score to be positive, notnegative. However, we suspect that age may have hadan interaction with general computer familiarity(younger players may have had slightly more andmore frequent exposure to computer use: althoughgame play frequency did not predict age, this isnot precisely the same measure as general computerliteracy).

There are a range of marginal or non-significantrelationships between gainExp scores and other vari-ables. Primarily the covariate challenge was marginallysignificant, such that condition 1 reported a loweraverage challenge than condition 2. Also, for instance,both conditions show a negative correlation betweenlearning and amount of time spent in the game tutorial(r(45) = -0.2 for both conditions), and a positive cor-relation between game play frequency and learningscores (r(20) = 0.32 for condition 1, r(25) = 0.08 forcondition 2). Taken together, this suggests that thoseregular game players who showed less reliance on thetutorial had an advantage in answering the Bloom 1–5

Repeat assessment A in exam setting

Game session 2: continuing to play scenario S with learning content C

Peacemaker: Tutorial and Game session 1: scenario S with learning content C

Assessment A on topics relating to content C - in exam setting

Qualitative assessment: Open questions

Quantitative assessment:structured questions

G1: 20 subjects, 1 at a

time

Guided group reflection on in-game performanceEmotional self-report,

proceed straight to session 2

G2: 25 subjects, 2 or 3 at a time, separate playing location

Emotional self-report. Colocate subjects

2

3

4

5End interview

Figure B2 The Experiment Protocol Setting out Activities for Two Conditions in Five Phases

Learning and reflection in serious games 367

© 2013 John Wiley & Sons Ltd

questions. Finally, in terms of the game score, thesuccess achieved by players varied considerably, but ingeneral condition 1 players seemed to score higher thancondition 2, suggesting that better players also per-formed better in the learning test.

Interest in the experience was balanced across con-ditions, and those who reported being interested hadroughly equal learning scores to those who saidthe opposite. However, for each learning score, thecorrelations were not strong, condition 1 gainExpr(20) = -0.10, condition 2 gainExp r(25) = -0.12;condition 1 open r(20) = 0.17, condition 2 openr(25) = -0.32. Those who said they learned somethingdefinitely scored better in open questions than thosewho said they didn’t, but with no great differencebetween conditions – and those who didn’t mention itscored best of all. Again, correlations were low acrossconditions and scores.

References

Anderson, L. W., Krathwohl, D. R., & Bloom, B. S. (2001).A taxonomy for learning, teaching, and assessing: Arevision of Bloom’s taxonomy of educational objectives.New York, NY: Longman.

Andresen, L., Boud, D., & Cohen, R. (1995). Experience-based learning. In G. Foley (Ed.), Understanding adulteducation and training (2nd ed., pp. 225–239). Sydney,Australia: Allen and Unwin.

Barab, S. A., Gresalfi, M. S., & Ingram-Goble, A. (2010).Transformational play: Using games to position person,content, and context. Educational Researcher, 39, 525–536.

Bateman, C., & Boon, R. (2005). 21st century game design.London: Charles River Media.

Biederman, I., & Vessel, E. (2006). Perceptual pleasureand the brain. American Scientist, 94, 247–255.doi:10.1511/2006.3.247.

Table B1. List of Factors and Covariates Gathered to Explain Additional Variance Between Participants

Data label Description SW

Condition Conditions 1 and 2 N/AAge Recorded as a yearly value YTime to answer Q1 Time spent answering the questions before playing YTime to answer Q2 Time spent answering the questions after playing Y%Change: Time to Ans Second answering time as a percentage of first answering time Y%Change: Open answer length Length of 2nd answers to open questions as a percentage of length of 1st

answersY

Dominance (SAM) [sum] Players rate how dominant (in control) they felt after each game playsession (two responses) on a 9-point-illustrated Likert scale

YDominance [difference] YValence (SAM) [sum] Players rate how unpleasant or pleasant they felt after each game play

session (two responses) on a 9-point illustrated Likert scaleY

Valence [difference] NArousal (SAM)[sum] Players rate how excited they felt after each game play session (two

responses) on a 9-point Likert scaleY

Arousal [difference] NMotivation (SDT) [sum] Players rate how competent/motivated they felt after each game play

session (two responses) on a 7-point Likert scaleY

Motivation [difference] YFlow (iGEQ) Players rate how in flow they felt during play, on two scales YCompetence (iGEQ) After both game play sessions (once for all play), players rate their feeling

of competence in play on two 5-point Likert scalesN

Sensory and imaginativeimmersion (iGEQ)

After both game play sessions (once for all play), players rate their feelingof immersion in play on two 5-point Likert scales

N

Tension (iGEQ) Players rate how tense they felt during play on two scales NChallenge (iGEQ) Players rate how challenged they felt during play on two scales NNegative affect (iGEQ) Players rate how unpleasant they felt in play on two scales NPositive affect (iGEQ) Players rate how pleasant they felt while playing on two scales NTutorial time Time spent on the tutorial for the game before playing NGender Male/Female NEducation 1 = degree, 2 = still studying, 3 = other NEthnic background Open answer allowed. Ultimately this value was not codified N/ALanguages Open answer allowed. Ultimately this value was not codified N/AReligious views 1 = none, 2 = atheist/agnostic, 3 = Christian, 4 = Jewish, 5 = other NConnection to conflict If the subject has any personal connection to the Israel–Palestine situation,

1 = yes, 2 = noN

Game-play frequency Regularity of playing games on a 5-point Likert scale N

B. Cowley et al.368

© 2013 John Wiley & Sons Ltd

Blunt, R. (2009). Do serious games work? Results fromthree studies. eLearn, 2009(12), 1. doi:10.1145/1661377.1661378

Boud, D., Keogh, R., & Walker, D. (1985). Reflection,turning experience into learning. London; New York:Kogan Page; Nichols Pub.

Bradley, M. M., & Lang, P. J. (1994). Measuring emotion:The self-assessment manikin and the semantic differential.Journal of Behavior Therapy and Experimental Psychia-try, 25, 49–59.

Burak, A., Keylor, E., & Sweeney, T. (2005). PeaceMaker:A video game to teach peace. Intelligent Technologies forInteractive Entertainment (pp. 307–310).

Cook, D. (2006). The chemistry of game design. Gamasutra.Retrieved from http://www.gamasutra.com/view/feature/1524/the_chemistry_of_game_design.php

Cowley, B., Bedek, M., Heikura, T., Ribiero, C., & Petersen, S.(2012). The QUARTIC process model to support seriousgames development for contextualized competence-basedlearning and assessment. In M.-M. Cruz-Cunha (Ed.),Handbook of research on serious games as educational,business and research tools: Development and design(pp. 491–519). New York, NY: IGI Global.

de Freitas, S. (2006). Learning in immersive worlds. Reportof the Joint information systems committee. Bristol, UK.

de Freitas, S., & Griffiths, M. (2008). The convergence ofgaming practices with other media forms: What potentialfor learning? A review of the literature. Learning, Mediaand Technology, 33(1), 11–20.

Dewey, J. (1910). How we think. London; Cambridge, MA:D. C. Heath and Co.

Egenfeldt-Nielsen, S. (2006). Overview of research on theeducational use of video games. Nordic Journal of DigitalLiteracy, 1, 184–213.

Ewell, P. T. (1997). Organizing for learning: A point of entry.National Center for Higher Education ManagementSystems (NCHEMS).

Freire, P. (1974). Education for critical consciousness.London: Sheed and Ward.

Gee, J. P. (2003). What video games have to teach usabout learning and literacy. New York, NY: PalgraveMacmillan.

Gee, J. P. (2006). Are video games good for learning? NordicJournal of Digital Literacy, 1, 172–182.

Gobet, F. (1998). Expert memory: A comparison of fourtheories. Cognition, 66, 115–152.

Gonzalez, C., & Czlonka, L. (2010). Games for peace. In J.Cannon-Bowers & C. Bowers (Eds.), Serious game designand development (pp. 134–149). Hershey, PA: InformationScience Reference.

Gribble, J., Meyer, L., & Jones, A. (2003). Quantifyingand assessing learning objectives. Working Paper SeriesPaper no. 112, Centre for Actuarial Studies, Universityof Melbourne, Australia. Retrieved from http://repository.unimelb.edu.au/10187/665

Guay, F., Ratelle, C. F., & Chanal, J. (2008). Optimal learningin optimal contexts: The role of self-determination in edu-cation. Canadian Psychology, 49, 233–240.

Guillén-Nieto, V., & Aleson-Carbonell, M. (2012). Seriousgames and learning effectiveness: The case of It’s a Deal!,Computers and Education, 58(1), 435–448. ISSN 0360-1315, doi:10.1016/j.compedu.2011.07.015

Ijsselsteijn, W. A., de Kort, Y. A. W., & Poels, K. (no date).The Game Experience Questionnaire: Development of aself-report measure to assess the psychological impact ofdigital games. Manuscript in preparation.

Jonassen, D. H. (1994). Thinking technology: Toward a con-structivist design model. Educational Technology, 34(4),34–37.

Ketelhut, D. J. (2006). The impact of student self-efficacyon scientific inquiry skills: An exploratory investigationin river city, a multi-user virtual environment. Journalof Science Education and Technology, 16, 99–111.doi:10.1007/s10956-006-9038-y

Kirschner, P. A., Sweller, J., & Clark, R. E. (2006). Whyminimal guidance during instruction does not work:An analysis of the failure of constructivist, discovery,problem-based, experiential, and inquiry-based teaching.Educational Psychologist, 41(2), 75–86.

Kolb, D. A. (1984). Experiential learning : Experience asthe source of learning and development. Englewood Cliffs,NJ: Prentice-Hall.

Koster, R. (2005). A theory of fun for game design. Scotts-dale, AZ: Paraglyph Press.

Locke, J. (1690). An Essay concerning humane understand-ing: In four books. London: Basset.

Lord, T. R., French, D. P., & Crow, L. W. (2009). Collegescience teachers guide to assessment. Arlington, VA:National Science Teachers Association.

McGinnis, T., Bustard, D. W., Black, M., & Charles, D.(2008). Enhancing E-learning engagement using designpatterns from computer games. Proceedings of First Inter-national Conference on Advances in Computer-HumanInteraction. Sainte Luce, Martinique: IEEE ComputerSociety.

McMahon, M. (1997). Social constructivism and the worldwide web – A paradigm for learning. Proceedings ofthe conference for Australasian Society for Computersin Learning in Tertiary Education (ASCILITE). Perth,Australia.

Learning and reflection in serious games 369

© 2013 John Wiley & Sons Ltd

Miettinen, R. (2000). The concept of experiential learn-ing and John Dewey’s theory of reflective thought andaction. International Journal of Lifelong Education, 19,54–72.

Mogle, J. A., Lovett, B. J., Stawski, R. S., & Sliwinski, M. J.(2008). Research report: What’s so special about workingmemory? An Examination of the relationships amongworking memory, secondary memory, and fluid intelli-gence. Psychological Science, 19, 1071–1077.

O’Neil, H. F., Wainess, R., & Baker, E. L. (2005). Classifi-cation of learning outcomes: Evidence from the computergames literature. Curriculum Journal, 16, 455–474.

Ritterfeld, U. (2009). Serious games: Mechanisms andeffects. New York, NY: Routledge.

Salen, K., & Zimmerman, E. (2004). The rules of play:Games design fundamentals. Cambridge, MA: MIT press.

Scholz, J., Klein, M. C., Behrens, T. E. J., & Johansen-Berg,H. (2009). Training induces changes in white-matterarchitecture. Nature Neuroscience, 12, 1370–1371.doi:10.1038/nn.2412

Sutton-Smith, B. (1997). The ambiguity of play. Cambridge,MA: Harvard University Press.

Vygotsky, L. S., & Cole, M. (1978). Mind in society thedevelopment of higher psychological processes. Cam-bridge, MA: Harvard Univ. Press.

Wertenbroch, A., & Nabeth, T. (2000). Advanced LearningApproaches and Technologies: The CALT Perspective.Fontainebleau Cedex, France: Centre for Advanced Learn-ing Technologies.

Wetzel, K., & Strudler, N. (2006). Costs and benefits ofelectronic portfolios in teacher education: Student voices.Journal of Computing in Teacher Education, 22(3),99–108.

Wong, W.-L., Shen, C., Nocera, L., Carriazo, E., Tang, F.,Bugga, S. ... Ritterfeld, U. (2007). Serious video gameeffectiveness. Paper presented at the Proceedings of theinternational conference on Advances in computer enter-tainment technology.

Zapata-rivera, J., & Greer, J. E. (2003). Analysing studentreflection in the learning game. Paper presented at theAIED workshop on Learner Modelling for Reflection.Retrieved from http://www.eee.bham.ac.uk/bull/ws/aied03

B. Cowley et al.370

© 2013 John Wiley & Sons Ltd