evaluating player experience for children’s outdoor pervasive games

14
Evaluating player experience for children’s outdoor pervasive games Iris Soute , Saskia Bakker, Remco Magielse, Panos Markopoulos Department of Industrial Design, University of Technology Eindhoven, P.O. Box 513, 5600 MB Eindhoven, The Netherlands article info Article history: Received 22 August 2011 Revised 3 September 2012 Accepted 23 September 2012 Available online 5 October 2012 Keywords: Children’s games Player experience Evaluation methodology Pervasive games abstract There is a growing body of research in pervasive outdoor gaming, mainly focused on adult players playing games on smart phones. Published evaluations of the player experience in such games are largely based on anecdotal descriptions and post-play surveys. The latter approach is especially challenging to apply when the play test participants are children. Observations of game play so far have been ad hoc relying on unstructured observation, which makes it difficult to extract reliable conclusions from observations and to draw comparisons between different games. In this paper we present two methods developed spe- cifically for evaluating the player experience in children’s outdoor games: the Outdoor Play Observation Scheme (OPOS) and GroupSorter. We discuss their application in three different case studies and con- clude that OPOS is useful in quantifying the different types of play behavior in outdoor games; GroupSort- er adds qualitative data on the play experience. Moreover, the application of GroupSorter is not limited to game development but can be used for obtaining user input in other context as well. Ó 2012 International Federation for Information Processing Published by Elsevier B.V. All rights reserved. 1. Introduction A consequence of the current wide adoption of mobile comput- ing is the emergence of mobile and pervasive gaming, where mobile interactive devices with computing and communication capabili- ties (typically smart phones) are used to play games outdoors. These games may involve one or more players who may be distrib- uted or co-located, and where game play can take place in the broad variety of locations and contexts where one might expect such mobile devices to be used. This class of games represents a vast new growth area for mobile interactive technologies but also, we argue, a new set of methodological challenges relating to the user centered design of related systems. Research in this area has progressed largely in terms of developing research prototypes and charting the related technical, interaction, and game design challenges. For example, the pioneering pervasive game Can You See Me Now?[1] is a mixed reality chasing game intended for adults dispersed in an urban environment; the emphasis of the research- ers was on exploring and demonstrating the limitations but also the opportunities related to creating pervasive games that rely on Wi-Fi and GPS infrastructures. Similar examples are CatchBob! [2] and Feeding Yoshi [3]. In these projects the evaluations are fo- cused more on the technological innovations, and on exploring the nature of the emerging user experiences and less on the ambi- tion to just create a fun and playable game. As the transition is made between the initial pioneering phase to more routine development and eventually adoption of such games, the need emerges for appropriate evaluation method- ology. To this point there has been no systematic effort on this front. Partly, this is a consequence of the novelty of the field: by its nature methodology research inevitably lags behind design innovations. Then again it could reflect expectations by designers and researchers, that traditional user centered design methodol- ogy [4] or more recently experience design [5] suffices for the purposes of mobile and pervasive game design, perhaps with minor or major adaptations to fit the specific design context. While for many cases this may hold true, our particular interest in design- ing mobile and pervasive games for children leads us to a different position. The research reported here is part of an investigation into an emerging genre of mobile and pervasive games intended for chil- dren players; we use the term Head Up Games (HUGs) to describe co-located outdoor games supported by technology and targeting children. Head Up Games were introduced by Soute et al. [6,7] in juxtaposition to the prevalent, at that time, interest of the research community on games for adult mobile players that superimpose a virtual world onto physical environments. Typically such games require players to play ‘head-down’ attending their mobile device and interacting through the mobile devices and network rather than directly with co-located players. While such games do have substantial merit, it is especially so that young children need rich face-to-face social interaction and intense physical activity typical of their non-technology based outdoor games, for developing their social and physical skills [8] and flexibility with respect to the physical context. In Soute and Markopoulos [6] and Soute et al. [7] we put forward Head Up Games as a reaction to pervasive games for adults, aiming to inject modern pervasive gaming with 1875-9521/$ - see front matter Ó 2012 International Federation for Information Processing Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.entcom.2012.09.003 Corresponding author. Tel.: +31 40 247 5228; fax: +31 40 2473285. E-mail address: [email protected] (I. Soute). Entertainment Computing 4 (2013) 25–38 Contents lists available at SciVerse ScienceDirect Entertainment Computing journal homepage: ees.elsevier.com/entcom

Upload: independent

Post on 13-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Entertainment Computing 4 (2013) 25–38

Contents lists available at SciVerse ScienceDirect

Entertainment Computing

journal homepage: ees .e lsevier .com/entcom

Evaluating player experience for children’s outdoor pervasive games

Iris Soute ⇑, Saskia Bakker, Remco Magielse, Panos MarkopoulosDepartment of Industrial Design, University of Technology Eindhoven, P.O. Box 513, 5600 MB Eindhoven, The Netherlands

a r t i c l e i n f o a b s t r a c t

Article history:Received 22 August 2011Revised 3 September 2012Accepted 23 September 2012Available online 5 October 2012

Keywords:Children’s gamesPlayer experienceEvaluation methodologyPervasive games

1875-9521/$ - see front matter � 2012 Internationalhttp://dx.doi.org/10.1016/j.entcom.2012.09.003

⇑ Corresponding author. Tel.: +31 40 247 5228; faxE-mail address: [email protected] (I. Soute).

There is a growing body of research in pervasive outdoor gaming, mainly focused on adult players playinggames on smart phones. Published evaluations of the player experience in such games are largely basedon anecdotal descriptions and post-play surveys. The latter approach is especially challenging to applywhen the play test participants are children. Observations of game play so far have been ad hoc relyingon unstructured observation, which makes it difficult to extract reliable conclusions from observationsand to draw comparisons between different games. In this paper we present two methods developed spe-cifically for evaluating the player experience in children’s outdoor games: the Outdoor Play ObservationScheme (OPOS) and GroupSorter. We discuss their application in three different case studies and con-clude that OPOS is useful in quantifying the different types of play behavior in outdoor games; GroupSort-er adds qualitative data on the play experience. Moreover, the application of GroupSorter is not limited togame development but can be used for obtaining user input in other context as well.� 2012 International Federation for Information Processing Published by Elsevier B.V. All rights reserved.

1. Introduction

A consequence of the current wide adoption of mobile comput-ing is the emergence of mobile and pervasive gaming, where mobileinteractive devices with computing and communication capabili-ties (typically smart phones) are used to play games outdoors.These games may involve one or more players who may be distrib-uted or co-located, and where game play can take place in thebroad variety of locations and contexts where one might expectsuch mobile devices to be used. This class of games represents avast new growth area for mobile interactive technologies but also,we argue, a new set of methodological challenges relating to theuser centered design of related systems. Research in this area hasprogressed largely in terms of developing research prototypesand charting the related technical, interaction, and game designchallenges. For example, the pioneering pervasive game Can YouSee Me Now?[1] is a mixed reality chasing game intended for adultsdispersed in an urban environment; the emphasis of the research-ers was on exploring and demonstrating the limitations but alsothe opportunities related to creating pervasive games that relyon Wi-Fi and GPS infrastructures. Similar examples are CatchBob![2] and Feeding Yoshi [3]. In these projects the evaluations are fo-cused more on the technological innovations, and on exploringthe nature of the emerging user experiences and less on the ambi-tion to just create a fun and playable game.

As the transition is made between the initial pioneeringphase to more routine development and eventually adoption of

Federation for Information Process

: +31 40 2473285.

such games, the need emerges for appropriate evaluation method-ology. To this point there has been no systematic effort on thisfront. Partly, this is a consequence of the novelty of the field: byits nature methodology research inevitably lags behind designinnovations. Then again it could reflect expectations by designersand researchers, that traditional user centered design methodol-ogy [4] or more recently experience design [5] suffices for thepurposes of mobile and pervasive game design, perhaps withminor or major adaptations to fit the specific design context. Whilefor many cases this may hold true, our particular interest in design-ing mobile and pervasive games for children leads us to a differentposition.

The research reported here is part of an investigation into anemerging genre of mobile and pervasive games intended for chil-dren players; we use the term Head Up Games (HUGs) to describeco-located outdoor games supported by technology and targetingchildren. Head Up Games were introduced by Soute et al. [6,7] injuxtaposition to the prevalent, at that time, interest of the researchcommunity on games for adult mobile players that superimpose avirtual world onto physical environments. Typically such gamesrequire players to play ‘head-down’ attending their mobile deviceand interacting through the mobile devices and network ratherthan directly with co-located players. While such games do havesubstantial merit, it is especially so that young children need richface-to-face social interaction and intense physical activity typicalof their non-technology based outdoor games, for developing theirsocial and physical skills [8] and flexibility with respect to thephysical context. In Soute and Markopoulos [6] and Soute et al.[7] we put forward Head Up Games as a reaction to pervasivegames for adults, aiming to inject modern pervasive gaming with

ing Published by Elsevier B.V. All rights reserved.

26 I. Soute et al. / Entertainment Computing 4 (2013) 25–38

some of the traits and advantages that have characterized tradi-tional outdoor play for children over the centuries.

Comparable motivations have inspired researchers to exploreinteractive games and playful installations, where children canplay together and where physical activity is an inherent part ofthe play experience. Examples of such research are interactivegames that support exertion [9], interactive installations that sup-port groups to by physically active such as the interactive slide orthe interactive fountain [10], the enhancing sporting with sensorbased interactive technology [11], etc. A common goal for all thesegenres of interactive technology is physical activity and socialinteraction and, of course, the designer’s intent to embed thesein an engaging and fun activity. These shared design goals bringabout recurring methodological challenges with regards to evalua-tion. The involvement of children already presents a distinct set ofmethodological challenges for evaluators that require either theadaptation of evaluation methodologies originally developed foradult test participants or the invention of novel evaluation meth-ods [12]. Further, the mobile context, the physical activity andthe open space, limit the applicability of testing methods originallyintended for a laboratory context and a static interaction platformsuch as a desktop computer.

In the next section we elaborate these two points, in order tomotivate the development of specialized methodologies for testingmobile games for children and we review relevant, existing re-search in similar areas. Next, we present two methods we havedeveloped specifically for the purpose of evaluating outdoor gameswith children; the Outdoor Play Observation Scheme (OPOS), astructured observation method for evaluating emerging playbehaviors, and GroupSorter, a method suitable for children age 7and up, that enables evaluating the experienced fun as well asthe rationale behind it. Section 4 provides the details on the appli-cation of both methods in three case studies. Subsequently, Sec-tion 5 reflects on the usage of the methods and provides ideasfor further improvement. Finally, we conclude with a discussionin Section 6.

2. Related work

This section is divided in three parts: a review of existing eval-uation methods for play, games and physical activity in general, areview of games that have been developed that are closely relatedto Head Up Games, and, concluding, a review of child-centeredusability evaluation methods.

2.1. Related evaluation methods

Let us start this review by stating that there are many differentdefinitions of both ‘play’ and ‘games’ and that there exists a ‘sur-prisingly complex relationship’ [13] between the two. Further-more, there are many theories on why play exists and the role ofplay in child development; for an overview of both classic as wellas modern theories we refer the reader to [14]. As there is no con-sensus on the exact definition of ‘play’ it is not hard to see there ex-ists no ‘gold standard’ for assessing play, and, unsurprisingly, manydifferent approaches in as many different disciplines are taken. Wehighlight several of these, though this is by no means an exhaus-tive review.

2.1.1. Study of playIn the area of developmental psychology play is regarded as an

essential aspect in a child’s development [8] and studies of chil-dren’s play often focus on the social development of young chil-dren. Assessment of children’s play is most commonly done byobservation.

For instance, Parten [15] conducted one of the very first obser-vational studies of children’s play. She defined an observationscheme to study the social participation of preschoolers (2–4 year)in spontaneous play using a one-minute sampling method. For4 months, children were observed 1 h per day in which they werefree to choose what and with whom to play. Parten defined a scalefor classifying children’s social participation in play, ranging fromnon-social play (‘solitary play’) to play involving high social partic-ipation (‘cooperative or organized supplementary play’). From herobservations she concluded that the social participation increaseswith age [15].

Rubin’s [16] Play Observation Scheme (POS) combined Parten’swork with the Smilansky classification of play behavior in anobservation scheme that classifies both social play and cognitiveplay. POS has been used in several projects investigating the freeplay behavior of preschoolers, e.g., [17] and [18]. Rubin [17] ap-plied POS to compare free-play behaviors of preschool- and kinder-garten-aged children. One of the main conclusions that was drawn,was that kindergarten children engage more in group and dramaticplay than preschool children. Hetherington et al. [18] applied aslightly modified version of POS to study the effect of divorce onsocial interaction and play on preschool children. The childrenwere observed during six sessions over the course of 2 years aftertheir parents divorced. Based on the results Hetherington et al.concluded, amongst others, that compared to children of non-divorced families play patterns of children from divorced familieswere more fragmented and less socially mature during the firstyear after the divorce.

Metin [19] studied children’s play in a playground and the effectof the equipment design on their play behavior. She classifiedbehavior in behavior patterns (e.g., talking, pretending) and playtypes (e.g., sensorimotor play, pretend play, games with rules).70 children, aged 6–12, were observed in a park; after the observa-tion session a short interview was conducted. The results of thestudy showed that today’s playground has little value in terms ofplay. Physical and social developments are supported to an extent,however cognitive and emotional development are not fostered.

2.1.2. Study of gamesUp until the arrival of computer games, the study of games was

sparse. A few notable exceptions are Johan Huizinga [20] and RogerCaillois [21] who discussed games from the perspective of cultureand sociology. However, as computer games rose in popularity, theinterest from the academic field into gaming also grew. For exam-ple, the influence of violent video games on aggression has oftenbeen studied [22,23]. Experiments typically use questionnairedata, sometimes combined with physiological measurements, toasses the participant’s attitude towards violence, aggressive behav-ior, aggressive affect or aggressive cognitions.

The application of digital games for children has often beenstudied from an educational perspective [24]. Typically in the fieldof HCI educational games are evaluated using cognitive tests, ques-tionnaires (e.g., [25]), and interviews (e.g., [26]).

Social aspects of video games that are studied are e.g. the feel-ing of presence [27], player enjoyment and engagement [28]. DeKort et al. developed the Social Presence in Gaming questionnaire(SPGQ) to measure the player’s feeling of social presence in digitalgames. Chen et al. interviewed MMORG (Massively MulitplayerOnline Role-playing Games) in a semi-structured interview to geta holistic account of the players gaming experience.

Finally, digital games are also researched to find out whatmakes them fun for children [29]. To this end Barendregt et al. havedeveloped the PIPC method: the problem identification picturecards method. This method combines the traditional think aloudmethod with picture cards that indicate certain types of usability

I. Soute et al. / Entertainment Computing 4 (2013) 25–38 27

or fun problems. Children can place the appropriate card in a boxwhen they encounter a problem while playing a computer game.

2.1.3. Physical activity in a play contextAs it is becoming more and more clear that a sedentary lifestyle

can cause health problems research into encouraging physicalactivity is growing. Dollman et al. [30] list eight approaches forphysical activity assessment of children, such as heart rate moni-toring, accelerometry and direct observation. Each of the toolshas its own merits and limits with respect to both validity of themeasures as well as the practical applicability. Mostly, these toolsare applied in the context of obesity research, or in clinicalapplications.

More specifically in the context of outdoor play, Haug et al. [31]use questionnaires administered to principals and students in or-der to determine the relationship between the outdoor environ-ment and the participation in physical activity during schoolbreaks of children age 8–15. Furthermore, the SOPLAY observationscheme [32] records play activities of groups. Every interval, typi-cally every few minutes, an observer scans a target area, noting thenumber of participants, their physical activity level and other con-textual characteristics.

2.1.4. ConclusionSummarizing we can say that play, games and physical activity

are studied in many different domains from many different per-spectives. Though a variety of methods is applied, we conclude thatobservations, questionnaires, and interviews are most commonlyused.

2.2. Related design work

Head Up Games [7] is a genre of mobile outdoor games for chil-dren, where children are co-located, and where technology sup-ports and enhances play as seen in traditional children’s outdoorgames such as hide and seek, and tag. Similar to traditional games,HUGs emphasize portability, mobility and physicality of gameplay, social interaction and minimal requirements on existinginfrastructure. Several game genres that have a close relation toHUGs include pervasive games, exertion games, and open-endedplay. In this section we will look closer at a selection of thesegames, in particular with respect to the evaluation methods usedin these projects.

The field of pervasive games is growing rapidly. Examples ofpervasive games for adults are Can you see me now? (CYSMN) [1],and PacManhattan [33]. Both games are location-based games, tar-geted at adults. Both games mix online and offline play to create achase game that is played out in an actual cityscape. CYSMN wasevaluated using naturalistic observations, by reviewing systemlogs of usage and through discussions with participants. ForPacManhattan it is unclear how it was evaluated.

Pervasive games that have been specifically designed for chil-dren are Savannah [34] and the Hunting of the Snark [35]. Bothgames are collaborative educational games for children. The evalu-ation of Savannah entailed unstructured video observation of thegame play combined with system logs of the game interactions.Based on these data usability issues with the system were identi-fied. In the Hunting of the Snark a more structured observationscheme was used that focused on various aspects of playfullearning.

Another body of research focuses on interactive play objects [36].Similar to HUGs, the goal here is to support activities that are richin social interaction and physical activity. The ColorFlares [37] arean example of interactive play objects. A user test was carriedout with ColorFlares to test whether children created their owngames and to see whether open-ended play stimulated social

interaction. Play sessions were recorded for analysis of play behav-ior i.e. the type and number of games children created. After theplay session children were asked to fill in a questionnaire. Thestudy showed that children were in fact able to come up withdiverse games that incorporated the ColorFlares, and that thechildren enjoyed doing this.

Furthermore, games that are specifically designed to stimulatephysical activity are dubbed exertion games [9]. For example, thegame Breakout for two [9] is a sports game that is played by twoplayers connected through the Internet. The game has been evalu-ated using questionnaires and interviews. Another example of agame that requires more physical activity is the adaptation of theNintendo game Donkey Konga [38], where bongos as input devicesreplaced the standard game controllers. This game was evaluatedin a study comparing the standard game controllers with the bon-gos. The data was gathered using questionnaires to measureengagement and by analyzing video footage to measure theamount of gestures and social interaction. To analyze the socialinteraction definitions based on the Autism Diagnostic ObservationSchedule were used. Based on the analysis of the video footage andthe questionnaires the researchers concluded that the social inter-action in the game with the bongos was significantly higher, andthat this did not detract from the engagement in the game.

2.3. Related methods in usability evaluations with children

In this section we review methods for usability evaluation withchildren and discuss their applicability in outdoor games.

Markopoulos et al. [12] divide usability evaluation methods forand with children roughly up into: observational methods, verbal-ization methods, survey methods, and inspection methods.

2.3.1. Verbalization methodsEvaluation of interactive systems with verbalization methods

such as the Think-Aloud method [39] aim to capture the thoughtprocesses of test-participants, by asking them to verbalize theseduring a test session. Think-aloud is often used in usability testing[40,41], and research has shown that it is an effective means toidentify improvements for interaction design [42]. Nevertheless ithas some well-known drawbacks. The first drawback concernsthe extra cognitive load required for test participants to verbalizetheir thoughts. The second drawback is regarding the difficult so-cial situations that might arise when a test-facilitator attempts toavoid interacting with test participants to avoid influencing theirthought processes [43]. These two core issues are even more pro-nounced when test participants are children, raising doubtsregarding the applicability and utility of the method for children.Research has established that it is feasible and effective also withchildren participants above age 9 [44], though this method as-sumes that the participants can talk while interacting with theproduct, can spare the cognitive resources and can be heard talkingwhile they do so, which is possible when they remain stationary inone location during the course of the evaluation. These conditionsare less likely to hold during the evaluation of a Head Up Game,which are often medium to high paced activities and are playedwith many players. Furthermore, the game play makes it virtuallyimpossible to verbalize thoughts, as the game itself requires theplayers to communicate with other players. Consequently, verbal-ization methods are deemed inappropriate for the type of gamesthat we want to evaluate.

2.3.2. Inspection methodsInspection methods are methods in which one or more experts

in the field analyze the (designed) interaction. In a study on theusefulness of playability heuristics in pervasive game evaluationsJegers [45] found that although many issues could be identified

28 I. Soute et al. / Entertainment Computing 4 (2013) 25–38

by heuristics, some major issues were not identified. For similarreasons, using inspection methods is not common practice in gamedesign. Salen and Zimmerman [13] state that it is nearly impossi-ble for ‘‘even a veteran designer [to] exactly predict what willand will not work before experiencing the game firsthand’’. Salenand Zimmerman, as well as Fullerton et al. [46], emphasize theimportance of play testing games. Based on our experiences withgame design and evaluation, we agree on this, and therefore con-clude that inspection methods are not appropriate for our purpose.

2.3.3. Observation methodsNumerous observation methods for adult test participants are

used for evaluation ranging in their rigor and their completeness.There is a focus primarily on identifying usability problems ratherthan on evaluating fun and the overall experience. In most cases,observation is unstructured although in some cases structuredscoring sheets are created to help code observations in a structuredway. An example is the DEVAN scheme for coding observationsduring usability testing with adults [47]. Such observation schemesthat require comparing interaction at a micro level (clicks, selec-tions, etc.), to verbalization and screen contents are hardly ableto cope with the fast pace of the outdoor game, the wide rangeof the playing field, and the intense experience that needs to beevaluated.

Focus on children as test-participants prompted Barendregt toextend the DEVAN observation scheme with elements referringto playing games [48]. However, such coding schemes are verymuch tied to the problem-solving aspects of learning a new gameand thus focus on how displays are interpreted and how the usercan figure out how to proceed with interaction; this cognitive ele-ment of gaming is still present in outdoor games but less central tothe game play and the emerging play experience especially in mo-bile games. More central to the success of the game, and thereforecritical for the evaluator, are the physical and social interactioncomponents of games that are currently not addresses in theseobservation schemes.

2.3.4. Survey methodsInterviews or questionnaires are often administered to obtain

children’s opinions about a specific product or system. Becausethey are issued after using the product they avoid the problemsof verbalization during use as discussed above, and they canequally well be applied for mobile contexts. Both interviews andquestionnaires need to be adapted when test participants are chil-dren to address their level of language development, interests, andskills. Methodologists have identified several pitfalls when survey-ing children’s opinions during usability testing [49]: satisficingwhich occurs when a child gives a more or less superficial responsesimply to complete their task (e.g., marking randomly one of themultiple choice options in a questionnaire), or suggestibility,where children’s responses reflect their perceived influences fromthe testing context, e.g., adults, environment, etc.

Applying lengthy questionnaires for children can pose a prob-lem as reading skills and the ability to remain focused on a (long)list of questions is not yet properly developed in children. Also,writing skills may vary largely over age groups and so question-naires for children require special consideration to be taken intoaccount concerning, e.g., length and wording [12], but also with re-gards to the validity of the answers given; as noted by Read andMacFarlane [50] children tend to give maximum scores indiscrim-inately to rating scales not allowing useful conclusions to be givenwith respect to how they evaluate their experience with games. Aset of instruments and tips for surveying children’s opinions hasthus been developed, for a review see [12] and these are largelyapplicable in the present context. These instruments are often usedin classroom settings or in laboratory situations and are easier to

administer as questionnaires or in one-to-one interviews. Adapta-tions may be needed when working with a group of children in be-tween outdoor play sessions or just after them.

2.3.5. ConclusionConcluding we can say that observation and survey methods

are the two most likely types of methods to gain useful insightsinto the play behaviors and play experience and, at the same time,are most appropriate for children in an outdoor context. However,the methods in their original form needed to be tailored to our spe-cific needs, which resulted in the two methods we present in thispaper: OPOS and GroupSorter.

3. Methods

From the above review and discussion it is clear that there areseveral requirements that our particular problem domain putsupon evaluation methodology. First, we address children, so themethod must be appropriate for them. Second, the children playoutdoors and are quite animated during play, so the evaluationneeds to fit their play activities, and, finally, to address them as agroup: evaluation sessions will by the nature of these games becarried out by groups of children. The opinions of children haveto be collected efficiently and avoiding problems such as childrentalking to each other about the evaluation or even the interview,while each is waiting for his/her turn to be interviewed.

In this section we will elaborate on the two methods we havedeveloped for evaluating computer enabled outdoor games forchildren. The methods complement each other: one, OPOS, gathersquantitative data, the other, GroupSorter, gathers qualitative data.

3.1. OPOS

Head Up Games are intended to stimulate physical activity andsocial interaction, which are essential aspects of traditional out-door children’s games. Both physical activity and social interactionare possible to observe directly, so the evaluation of these games(and other related genres of games) should not be limited to sur-veying children’s experience and reporting anecdotes of playbehaviors only. Rather, designers and researchers need observa-tional methods that evaluate the extent to which these behaviorsemerge during game play. While existing evaluations of pervasivegames reported in literature often mention that observations werecarried out, these seem to be for the most part unstructured obser-vations, and the analysis procedure is frequently undisclosed (e.g.,Savannah [34], and Ambient Wood [51]). While unstructuredobservation can be sufficient in exploratory evaluations, structuredobservations can produce more informative and reliable results,particularly for research studies aiming to validate a particular de-sign concept or comparative evaluations of different games, or ver-sions of a game.

Unfortunately, neither the literature on pervasive games nor therelated discipline of behavioral analysis provides an appropriatecoding scheme for our purposes. Thus, a new observation schemewas developed, called the Outdoor Play Observation Scheme orOPOS [52]. OPOS enables the collection of quantifiable data onphysical activity and social interaction, and thereby allows a com-parison to be drawn between different games (e.g., HUGs, tradi-tional outdoor games, computer games). OPOS was developed inan iterative process, which included several direct observationsof traditional (non-technological) outdoor games such as tag, hop-scotch and hide and seek, as well as video observations of Head UpGames performed by multiple independent observers. The processof developing OPOS is extensively described in [52].

Table 2Categories and subcategories of the observation schemes.

POS OPOS SOPLAY

Social Physical activity Physical activitySolitary play Intensive SedentaryParallel play Non-intensive WalkingGroup play No physical activity Very activeCognitive FocusFunctional play Looking at other playersConstructive play Looking at game objectsExploration Looking at something out of sightDramatic play Looking at something elseGames-with-rules SocialNon-play behaviors FunctionalUnoccupied Non-functional positive/neutralOnlooker Non-functional negativeTranslation With non-playerActive conversation Unintended physical contactAgressionRough-and-trumbleHoveringAnxious behaviorUncodableOut of room

I. Soute et al. / Entertainment Computing 4 (2013) 25–38 29

As we have seen in Section 2 several different observationsschemes are known for assessing play and physical activity. Fromthese, POS [16] and SOPLAY [32] are closest to the purpose andcontent of OPOS. We will shortly describe similarities and differ-ences between these observation schemes.

Table 1 shows an overview of the characteristics of each of theobservations schemes. SOPLAY focuses on measuring physicalactivity of large groups of children and its categories for observa-tion are similar to those of OPOS. With SOPLAY every few minutesan observer performs a scan of the play-area and players, and noteswhat type of activity children are engaged in. Though the catego-ries of physical activity in SOPLAY and OPOS are similar, the unitof analysis and the sampling frequency at which the behavior isobserved are different. SOPLAY observes the children as a groupresulting in an overall figure of how active the group is; in contrastin OPOS we observe each child separately. Also, we collect the dataat a higher frequency, i.e. over seconds rather than over minutes.The game play may only last a few minutes in total, rendering asampling frequency of every few minutes useless. Both collectingthe data for each child instead of for the whole group as well assampling at a higher frequency is needed for the benefit of the de-signer or evaluator of the game: knowing that the game leads to awalking behavior overall is important, but does not help the de-signer evaluate specific design decisions or elements in the game.

POS is an extensive observation scheme to evaluate children’splay in all of its forms. It encompasses the whole range of playbehavior in the context of social development and interaction. Inour Head Up Games some of these behaviors are by their naturenot present, such as solitary play. Rather, in a typical Head UpGame only the behaviors listed in Table 2 with a star are present.OPOS may therefore be viewed as a specialization of POS, focusingon specific behaviors for outdoor, multi-player games.

As OPOS is applied to each child separately and taking into ac-count the number of categories of behaviors it is virtually impossi-ble to perform coding in real-time, as is done with POS andSOPLAY. Instead, video recordings of children playing the targetedoutdoor game are required for off-line analysis.

Thus, to apply OPOS to measure player behavior in outdoorgame play, one must first record the game play on video. Since chil-dren are likely to be moving around the play-area, we advise to useat least two cameras to capture as much of the gameplay as possi-ble. The next step is to code the videos using OPOS. For an explana-tion of all categories in OPOS, see Table 3. Though it is possible tocode all behaviors manually, we advise using dedicated software asthis greatly speeds up both the coding process as well as the anal-ysis afterwards; we used Noldus Observer (http://www.nol-dus.com) for this. As indicated before, the OPOS scheme isapplied to each child individually, watching both videos simulta-neously to obtain a clear visual of a child’s behavior.

Inevitably, games are seldom equal in duration. We thereforerepresent the behaviors that are duration-recorded as a percentageof the total playtime and calculate event-recorded behaviors asevents per minute. Once all behaviors have been coded and ana-lyzed, the resulting data can be used to compare different games.

Table 1Overview of observation schemes.

POS OPOS

Aim Evaluate social and play development ifchild

Evaluate play behavgames

Observationtype

Direct, naturalistic observation Indirect, naturalistic

Unit of analysis One child Each child in groupContext Free play Outdoor game playData collection Time-sampling (partial interval-recording) Duration recording

When multiple independent observers are involved in the analysis,it is possible to assess the reliability of the observations.

3.2. GroupSorter

Although it is important to gather quantitative evidence forevoked play behavior (which can be achieved through OPOS),observation as such does not offer insight into what players wish,feel, or how they evaluate their play experiences. For this purposemethods that elicit qualitative data from the participants are re-quired to understand the game experience in general, but alsoabout what elements of a game will evoke desired experiences(e.g., fun). This could give information to help improve existinggames, as well as inform the design of new Head Up Games.

In the case of evaluating a multi-player game, we argue that it ismost logical to question the children as a whole group since theyhave experienced the game together. This will likely encourage dis-cussions between the players and therefore result in richer data. Tointerview a group at once, a common method to use is a focusgroup, or group interview [53]. Focus groups have been usedextensively to gather children’s views and opinions, e.g., [54]. Agroup interview offers several advantages over an individual inter-view. It creates a more natural context for children to be inter-viewed: children feel more comfortable and relaxed in a peersetting. Furthermore, a greater openness and variety in responsescan be expected [55], and there is less pressure on each child to re-spond to every question. Also, the focus group places the childrenin an expert position, which makes them feel less like being ques-tioned and more as though they are sharing experiences with peers[54]. Finally, interviewing children as a group may reduce the

SOPLAY

ior triggered by computer enabled Evaluate physical activity in leisuretime

observation Direct, naturalistic observation

of children Group of childrenLeisure time in school

and event recording Time-sampling (momentary)

Table 3The OPOS observation scheme.

Class Behavior Explanation

Physical activity Intensive physical activity Exhausting physical activity that one cannot keep doing for a long period of time. For example: running,jumping or skipping

Non-intensive physical activity Physical activity that one can keep doing for a longer period of time. For example: walking, moving arms orlegs, bending and standing up, crawling, moving while staying on the same location, etc.

No physical activity Standing, laying or sitting still. Very small movements such as coughing yawning, putting your hands inyour pocket, looking at your watch, etc. while being still should also fall in this category

Focus Looking at other players The player is looking at one or more other players. This does not only include looking at the face, but alsolooking at other parts of the body

Looking at game objects The player is looking at one or more game objects. All things that are part of the game besides players andsurroundings are game objects. For example a ball, a goal, a chalked circle on the ground, a hand heldobject, a token, etc.

Looking at something out of sight,possibly part of the game

Looking at objects, people or surroundings that are not part of the game

Looking at something else When game objects or players are out of sight of the camera and the observed player is looking in thedirection of which these player(s) or object(s) likely are

Social interaction Functional, with another player All interactions (verbal en nonverbal) that are functional for playing the game and directed to one or moreother players or to no-one. For example instructions such as ‘‘give me the ball!’’, ‘‘get the monster-coin!’’and ‘‘tag him!’’, or expressions like ‘‘John is it!’’, ‘‘tag!’’ or counting points aloud, or physical contact such astagging, holding hand, etc. that are needed to play the game

Non-functional positive/neutral, withanother player

All interactions (verbal and nonverbal) that are not functional for playing the game, that are positive orneutral and directed to one or more other players or to no-one. For example communication aboutsubjects that are not related to the game, showing results to other players, cheering, screaming,expressions of enjoyment and physical contact not required for playing the game such as holding hands,high five, etc.

Non-functional negative, withanother player

All interactions (verbal and nonverbal) that are not functional for playing the game, that are negative anddirected to one or more other players or to no-one. For example negative communication such as swearingand bullying, expressions of pain or negative physical contact such as kicking or hitting

With a non-player All interactions (verbal en nonverbal) that are directed to someone who is not a player in the game. Thiscan be a researcher, a teacher, a parent, a peer who is watching the game, etc.

Unintended physical contact Physical contact that is not intended, such as accidentally bumping into another child

General In sight In sight of the cameraOut of sight Out of sight of the camera

30 I. Soute et al. / Entertainment Computing 4 (2013) 25–38

power imbalance that exists between the adult interviewer and thechild interviewee [55]. From a practical point of view doing groupinterviews is also convenient: interviewing all children togetherrequires only one researcher to be present, and is easier to arrangeto take place right after the game play, especially in school settingswhere our games are often evaluated. It takes up less time andplanning for the teacher, compared to interviewing the childrenindividually. Planning individual interviews consecutively cancause a long time lapse for the child between playing the gameand being interviewed about it, which is less desirable.

Pitfalls that can occur in group interviews with children includeintimidation or peer pressure that might inhibit participants tovoice their own opinion; or to voice an opinion of another child,in a desire to fit in with the rest of the group [54]. Even if such adesire is not present, a child might still repeat the opinion of an-other child, simply because he has listened to the other’s opinionand readily agrees, before forming an opinion of his own (an effectknown as ‘cognitive tuning’ [56]).

As has been pointed out by Hanna et al. [57], ranking is a validway for children to indicate their preference, as opposed to ratings.Ratings often have a ‘‘ceiling effect’’: children tend to give maxi-mum ratings. An example of a ranking method specifically tailoredfor children is the Fun Sorter [50]. The Fun Sorter lets children rankitems on one or more constructs, for example ‘fun’ or ‘easy to use’.

We propose to combine rankings with the group interview toavoid some of the pitfalls mentioned above, as follows: first, welet the children rank elements of the game itself on the construct‘most fun’. The ranking is done individually. Subsequently, theranking is repeated, but now as a group task in a focus group set-ting. However, the ranking of preferences is not per se the goal ofthe evaluation. While ranking in the group context, it is naturalfor children to discuss what element is ranked first, what element

second, etc. and to provide arguments to their peers regarding theirorder of preferences. By recording the ensuing discussion, we hopeto obtain qualitative insights into why children think items are fun,or not. By first having the children rank individually, we try toachieve that children properly form their own opinion first, andnot merely follow the suggestions of a peer in the group discussion.Finally, since the items that the children are ranking automaticallylead the discussion in that direction, no questioning route has to beprepared. Still, we expect that a moderator is needed to guide thediscussion if children reach a dead end and cannot resolve an issuethemselves, and to make sure that all children have an equal par-ticipation in the discussion.

In short, the GroupSorter method is executed as follows:

� Researcher: choose items for ranking; note that the choice ofthese items will have a direct effect on the direction of the focusgroup discussion.� Let the children rank the items individually. Note that we did

not allow the children to sit side-by-side, to avoid copying ofthe rankings.� Repeat the ranking during the focus group. Record the ensuing

discussion.� Analyze results. We recommend not transcribing the data ver-

batim but only capture comments when they provide (1) anargument, more than just: ‘‘I liked it’’ or ‘‘I didn’t like it’’, (2)an example of an experience or (3) a suggestion for improve-ment. Next, these comments can be analyzed using conven-tional content analysis [58].

So, the main result of this method will consist of qualitativecomments that can be used to analyze the game play andexperience.

I. Soute et al. / Entertainment Computing 4 (2013) 25–38 31

4. Case Studies

The methods described in the previous section have been ap-plied in three case studies, which will be described in this section.For each case study we will shortly describe the game itself and thelarger context in which the study took place. Then we will focus onour experiences with OPOS and GroupSorter during the case study.

4.1. LightHouse

4.1.1. Context and setupIn contrast to the cases we describe next, where the focus was

to evaluate a newly designed Head Up Game, this case study fo-cused on the assessment of the reliability of OPOS. To that end, agame called LightHouse was developed in two versions: a gamein which players competed individually against each other andone game in which players competed in teams.

One class of 24 children of a Dutch school participated in thisexperiment. Four groups of 5–7 children (age 10–11 years old)were formed; boys and girls were equally divided over the groups,though friends were grouped together to make the experimentmore realistic. Two groups played the LightHouse game: one groupplayed the individual version of the game, the other group theteam version. The two other groups played tag and soccer respec-tively. The latter games were played and recorded to test whetherOPOS was capable of detecting differences in play behavior forgames other than Head Up Games.

Each game was captured on video by one stationary camera.Then we applied OPOS to analyze the behaviors evoked by the tra-ditional games (tag and soccer), and both versions of LightHouse.Also, we applied OPOS to a short video of two children playing avideo game on an XBox, as totally different behaviors are displayedin such a game. Finally, we compared the results of OPOS in termsof the evoked physical activity and social interaction, as well as thevisual focus of the players.

A secondary aim of the study was to compare the gameexperience of a technology enhanced (Head Up) game to a non-technological version of the same game. Therefore, the LightHousegame was redesigned in a non-technical adaptation of the originalgame. Two groups of children played both games and to comparethe children’s experiences GroupSorter was applied.

4.1.2. The LightHouse gameLightHouse [52] is a pirate-game in which players have to collect

treasures (wooden coins) from treasure islands, represented by

Fig. 1. Children playing t

chalked circles on the ground. A physical LightHouse object isstanding on another island, and emits a rotating light. Playersshould avoid being ‘seen’ by this LightHouse to keep their treasures.The LightHouse game can be played individually, or in teams of two.See Fig. 1 for an impression of the LightHouse game.

4.1.3. Experiences with OPOSFor the evaluation we have compared two versions of Light-

House, the individual and group version as described above, tothree other games, namely tag, soccer, and a video game. The re-sults of this comparison are shown in Figs. 2–4. From the graphswe can see that, quite unsurprising, the video game scores verylow for physical activity. In the other four outdoor games playersare much more physically active. Furthermore, in the category ‘fo-cus’ the game tag scores high for ‘looking at other players’ – thiscan be explained by the fact that there were no game objects tolook at. Finally, comparing the other three outdoor games, thebehavior ‘looking at other players’ occurred significantly higherin the LightHouse games than in the soccer game. Again, this caneasily be explained: in the soccer game players focused most onthe ball. Overall, we can conclude that the observation scheme issufficiently sensitive to pick up differences between different typesof games.

While all the data was coded by a single coder a selection of thedata was coded by two additional coders as well (see also [52]) tocalculate inter-coder reliability, resulting in the following Kappacoefficients: for physical activity (acceptable agreement accordingto [59]), for focus (moderate agreement), and for social interaction(low agreement). The low agreement in the class social interactionmay have been caused by the way we recorded the games. Theadditional coders indicated that it was especially hard to interpretwhether social interaction behaviors should be coded as functionalor non-functional to the game as it was often difficult to under-stand exactly what the players were saying. More sophisticatedrecording equipment such as multiple cameras could possiblyovercome this problem.

4.1.4. Experiences with GroupSorterBesides evaluating the applicability of OPOS, we examined the

potential added value of technology and its potential downsidesby comparing the technology-enhanced version of the LightHousegame to the non-technological version of the LightHouse game,i.e. without technological enhancements. In the technology-enhanced version of LightHouse, players had to collect treasureswithout being seen by a ‘physical’ LightHouse. This game object

he LightHouse game.

Fig. 2. The results in the class ‘focus’. Behaviors in average percentage of the totaltime.

Fig. 3. The results in the class ‘physical activity’. Behaviors in average percentage ofthe total time.

Fig. 4. The results in the class ‘social interaction’. Behaviors in average number ofevents per minute.

32 I. Soute et al. / Entertainment Computing 4 (2013) 25–38

emits a rotating light and makes a horn-sound when a player is‘seen’. In the non-technological version, the LightHouse is actedout by one of the researchers, who created the rotating light witha torch and screamed in case a player is ‘seen’. All other artifactsand rules are the same for both games.

Two groups of six children (age 10–11) played both the technol-ogy-enhanced and the non-technological version of the LightHousegame, and did a GroupSorting exercise afterwards to enable qual-itative analysis of the game experience of both games. For theGroupSorter, we visualized four major activities of the game oncards. Each activity was represented on two cards, one showingthe technological version of LightHouse, and one showing thenon-technological version. We first used these cards to explainthe GroupSorting procedure, after which we gave each child asheet of paper showing all eight cards (the order of the cardswas systematically varied to prevent order effects). The childrenwere asked to make their individual ranking on these sheets, andthen come together to make the group rating using the large cards.A researcher moderated the discussion during the group session,by frequently asking why certain decisions had been made.

As intended, the discussions focused mainly on the differencesbetween the technology-enhanced and the non-technological ver-sion of the game. Having the large cards at hand enabled the chil-dren to clearly distinguish and discuss both versions of the game.The discussions were recorded on video and later analyzed forquotes revealing differences between the two versions. In total,close to 30 quotes were captured which were analyzed throughconventional content analysis [58]. The general opinion was thatthe technology-enhanced version was more exciting or ‘cool’.

The moderators in this study felt that the discussions weresometimes influenced by peer pressure: one child would make a

suggestion and the others would agree without discussion. In suchcases, the moderators tried to stimulate discussion by asking spe-cific participants for their thoughts. The fact that children alreadyhad their personal rating at hand, enabled the moderator to askspecific questions (e.g., ‘you had a different preference in your per-sonal rating, why is that?’) to encourage discussion. This may havebeen harder when ‘traditional’ focus group techniques would havebeen used. One group-discussion was moderated by the researcherwho supervised the technology-enhanced version of LightHouseand the other by the supervisor of the non-technological version.We expected that children might argue in favor of the game thattheir interviewer had supervised. However, when comparing theopinions of both groups, this does not seem to have influencedthe results.

4.2. Save the safe

4.2.1. Context and setupThe aim of this study was to investigate the effect of a tangible

versus a virtual game element on the game experience. To this end,we designed two versions of the Save the safe game. 27 children (8–9 years) played both versions of the game, in four groups of eightchildren per session. Genders were evenly distributed over the ses-sions, and the order of play was systematically varied over the ses-sions. Each session started with a training game, then both versionsof the game were played, and for the final game the childrendecided which version of the game they wanted to play – similarto the AgainAgain method [50], where children are asked in a sur-vey what product they would want to use again. All sessions werevideotaped using two cameras at corners of the play field. Also, twochildren were equipped with head mounted cameras. Each groupparticipated in a GroupSorter session after playing both versionsof the game. The evaluation was conducted at a school, where chil-dren from one class participated. The evaluation took place overthe course of one school day.

4.2.2. The save the safe gameSave the safe was a refinement of the game Stop the Bomb by

Hendrix et al. [60]. The game is a team game that revolves arounda safe and its key. Players are divided into two teams, one to pro-tect the key and thus the safe and one to try to capture the key,to open the safe. The team that completes its goal within the allot-ted playing time wins the game. Each player is equipped with abelt-mounted device providing output through LEDs and a vibra-tion motor. At the beginning of the game the players are randomlyassigned to either team, indicated by the LEDs on the belt.

In the study the key was represented either virtually, usingvibration motors in the players’ belts, or physically, using a ball.

Fig. 5. Children playing Save the Safe; inset: belt.

I. Soute et al. / Entertainment Computing 4 (2013) 25–38 33

Only one player can have the key at the time. The virtual key couldbe exchanged between players by holding the belts near eachother, while the physical key (the ball) could be thrown betweenplayers. A more elaborate explanation of Save the safe can be foundin [61]. See Fig. 5 for an impression of the game.

4.2.3. Experiences with OPOSBased on the experiences during the evaluation of the Light-

House game, we decided to apply OPOS unchanged in the evalua-tion of Save the safe. Despite the low inter-coder reliability in thesocial interaction class found during the LightHouse evaluationcase, we believe it is valuable to distinguish functional and non-functional social interaction, in order to allow more detailed com-parison between different games. To capture social interactionsmore clearly, we decided to videotape the sessions by two station-ary cameras placed on two corners of the playing field, which was aschool playing field.

The results of the comparison between the tangible and virtualversion of the game are shown in Figs. 6 and 7 and Table 4. We canconclude that, overall, the children were equally active in the twogame versions, though there is a significant main effect of gametype when analyzing the percentage of intensive physical activity.Thus, in the game with the virtual element, children engaged inmore intensive physical activity than in the game with the tangibleelement. Furthermore, from the results in the category ‘focus’, wecan conclude that the children looked much more at other playersin the virtual game than in the tangible game. Vice verse, childrenlooked more at game objects in the tangible game, which can beeasily explained since a ball was used in that game.

All data was coded by a first observer, while 25% of the data wascoded by a second observer to calculate inter-rater reliability. Thisresulted in Kappa coefficient for the category physical activity, andfor the category focus. According to [59] these values indicatemoderate agreement between coders. For social interaction,indicating no agreement. We must conclude that using multiple

00% 25% 50

tangible

virtual

No physical activity Non-intensive physi

Fig. 6. The results in the class ‘physical activity’ for the Save the S

cameras did not make it easier to code social interaction behaviors.The footage of the head mounted cameras was not used in thisanalysis. We had some trouble with the fitting of the helmets;the turned out to be too loosely fitted to the children’s heads,resulting in unusable footage.

4.2.4. Experiences with GroupSorterWe applied the GroupSorter method in order to find out how

different elements of the game contributed to the overall gameexperience.

The items that we asked the children to rate from most fun toleast fun were: the belt, the safe, running, discussing, playing to-gether, the ball, the team, and winning. Note that the elementsof both games are mixed together to create eight items for oneranking. This allows us to compare the elements in relation to eachother; if we would have the children rate the elements in a rankingper game, we would not be able to do this. We presented the itemsas text with icons on stickers that children could stick on a piece ofpaper.

Children seemed to have no problems understanding the con-cepts they were asked to rank, and they could perform the taskquite easily. Immediately after the individual ranking the groupdiscussion took place (see Fig. 8). The discussion was recordedand we analyzed it using conventional content analysis [58].

All items that were on the list to be ranked were mentioned inthe discussion, although some items were discussed more inten-sively than others. This mainly happened when participants didnot agree on an item. The ensuing discussion gave much insightfulinformation about the reasons for liking or disliking a particularelement of the game. No new items were mentioned in the discus-sion, but the discussion did reveal some unexpected game dynam-ics of game elements influencing each other in a way that we hadnot foreseen when designing the game. Finally, the things thatwere said during the discussion were in agreement with the

% 75% 100%

cal activity Intensive physical activity

afe game. Behaviors in average percentage of the total time.

00% 25% 50% 75% 100%

tangible

virtual

something else game objects other players something out of sight

Fig. 7. The results in the class ‘focus’ for Save the Safe. Behaviors in average percentage of the total time.

Table 4The results in the class ’social interaction’ for Save the Safe.

Type Order Interaction

F(1,22) p F(1,22) p F(1,22) p

Functional 0.536 0.472 0.249 0.623 0.021 0.886Non-player 4.789 0.040* 1.535 0.228 2.702 0.114Positivea

Negative 1.565 0.224 0.634 0.434 8.062 0.010

* Significant at 5% level.a Too few observations.

34 I. Soute et al. / Entertainment Computing 4 (2013) 25–38

rankings the children had given. For example, many positive re-marks were given on the belt, which scored high in the ranking.

As opposed to the evaluation of LightHouse, in the evaluation ofSave the safe moderators did not note occurrences of peer pressure;on the contrary, children felt apparently quite at ease voicing a dif-ferent opinion than their peers, and were able to work together tomake a ranking, as the following discussion shows:

[for the sake of clarity, the names of the items that thechildren were ranking are underlined]

[child 1] I thought playing together was most fun

[child 2] [on my ranking] I have the belt on one, and winningon two

[child 3] oh, but I find winning is not so important

[child 2] shall we put the belt here then [points to a spot onthe ranking chart]?

[another child starts to put items on the chart without consultingthe others]

[child 4] Robert, don’t fill it in yourself, we have to agreefirst!

Fig. 8. Children participating in Group

Although it might be difficult to extract from the above tran-script, from the audio it was clear that children were able to freelyvoice their opinions and discussed to come up with a commonagreement to the ranking. Though one child made an attempt toimpose his own ranking on the rest, he was immediately correctedby his peers, no intervention from the moderator was necessary.However, we noticed that a moderator is absolutely necessary. Inone particular case a child started bullying another child, as theyhad contrasting opinions. In that case the moderator intervenedto correct the behavior of the children. Furthermore, in contrastwith ‘normal’ focus groups, the moderator did not have a set ofguiding questions. The items that had to be ranked automaticallyprovided guiding in the discussion. Most of the time when childrendid not agree on where to rank an item, a discussion would inevi-tably follow. However, on items that the children were in perfectagreement, comments such as ‘‘yes, that’s it’’ were often givenwithout further explanation. The moderator would then promptthe children to elaborate on their choice. The transcript belowshows this:

[mod] Which game did you like best?[child 1] I liked the game with the belt most

[other children agree][mod] Does everybody agree? [child 2] What do you

think?[child 2] No, I found the game with the ball the most fun

[. . . more discussion on which game was most fun...][mod] Why did you find the game with the belt the most fun[child 3] I liked the sensation of the belt, and the fact that I

had to run faster [compared to the game with the ball]

Sorter after playing Save the Safe.

Fig. 9. Children playing HeartBeat. Inset: the HeartBeat game device.

I. Soute et al. / Entertainment Computing 4 (2013) 25–38 35

4.3. HeartBeat

4.3.1. Context and setupIn this case study we incorporated biofeedback in an outdoor

game. The game was developed in an iterative process, involvingchildren early on in the process. The resulting game was playtested with 32 children, ages 11–13. The aim of the evaluationwas to compare two versions of the game, one incorporating heartrate measurement and one not. Participants played the game ingroups of eight players and each player experienced both versions.The game play was recorded and the video footage was evaluatedusing OPOS. Two classes played the game during a school day. Thegame was played in a park near their school and evaluations wereconducted in the school. Directly after each game, children filledout a Game Experience Questionnaire (GEQ) [62]. Next, after play-ing both versions of the game, GroupSorter was used forevaluation.

4.3.2. The HeartBeat gameHeartBeat (see Fig. 9) is played in two teams. Players are ran-

domly assigned to either the attacking team or the defending team.The goal of the defending team is to defend a virtual treasure,while the goal of the attacking team is to capture this treasure.To play this game, each player uses a game device. This device iscapable of broadcasting wireless signals and measuring heart rate.When the heart rate of a player trespasses a preset threshold, hisdevice will emit wireless signals in correspondence with theplayer’s heart rate. Devices of enemy players in vicinity receivingthis signal will beep. Further details on this game can be foundin [63].

4.3.3. Experiences with OPOSThe game of HeartBeat was played in a large outdoor park, with

bushes and trees for the children to hide behind, making it impos-sible to capture the play-behavior with one or two stationary cam-eras, as has been done in the game evaluations of LightHouse andSave the safe. To solve this issue, two players in each game receiveda head-mounted camera. This made it possible to review the playbehavior from one player’s point of view.

From the play sessions 10 videos were selected for observation,five videos of the game played with heart rate monitor and fivevideos of the game played without the heart rate monitor. One ob-server coded the videos, using only the categories of physical activ-ity and social interaction. The category ‘general’ was irrelevant,since the camera was head-mounted. The category of focus was leftout, since it was impossible to determine exactly what the player

was looking at from the head-mounted camera footage (e.g., aplayer looking down can be watching his footsteps to see wherehe is running, but can alternatively be watching his game device,or can try to stay hidden for other players). A selection of the mate-rial was also coded by a second coder, and from this the inter-raterreliability was calculated: for the category physical activity this re-sulted in the Kappa coefficient, and for the category social interac-tion Kappa coefficient was calculated at. Analysis showed no majordifferences between the conditions. In the game without the heartrate monitor children ran 22.7% of the time, compared to 34.9% inthe game with the heart rate monitor. The results in this caseserved as a proof of concept for Head Up Games. Our main concernin coding the footage was to prove that our implementation ofheart rate monitoring did not have a negative effect on physicalactivity. Since the difference between the two game versions wasminimal, the results from OPOS did not show significantdifferences.

4.3.4. Experiences with GroupSorterIn the evaluation of HeartBeat the same approach as in Save the

safe was used. Children sorted five game elements according tohow much ‘fun’ they experienced them. The game elements re-ferred to various physical, social and technical game elements: tag-ging other players, random team allocation or ‘hearing’ otherplayers through your game device. Four focus groups were held,three of them with eight participants, one with three participants.The average length of a focus group session was 15 min. Directly,after playing two versions of the game, the children were takeninto an empty classroom where the discussions were held. Chil-dren first filled out their own ranking scheme and subsequentlydiscussed it with others.

In our experience the atmosphere in the focus groups wasmixed. Two moderators each conducted two sessions with differ-ent groups. There was a noticeable difference in atmosphere be-tween the two moderators. One moderator had very calmsessions, where children waited for each other and generallyagreed upon most topics. The other moderator had sessions withmore discussions. Children disagreed with each other and felt freeto speak aloud to describe their own experience or opinion. Whenreviewing the focus group footage we experienced a fairly bal-anced distribution of speaking turns. Although more assertive chil-dren obviously accounted for more comments, the individualranking scheme allowed the moderators to mingle silent childreninto the discussion. Only in very few cases these silent childrenagreed to everything. We also observed that the discussion

36 I. Soute et al. / Entertainment Computing 4 (2013) 25–38

exclusively revolved around the chosen game elements. Childrendid not bring other topics into the discussion.

In general the GroupSorter exercise provided us with numerousexamples of game situations. Children mostly argued why theyliked, or didn’t like game elements based on what they experi-enced. For example, one child reported she didn’t like a technicalfeature, because it failed several times during play. When the mod-erator inquired whether she would have liked the feature when itwould not have failed, she agreed to this.

The moderators were given a set of guidelines and back-upquestions. One very important rule in the discussion was that onlythe moderator was allowed to place or change the order of thecards. Children first had to convince others that a game elementshould be ranked higher or lower, before the moderator wouldchange it.

5. Reflections

5.1. OPOS

OPOS provides a concrete and operational way to describe sali-ent aspects of outdoor play, more specifically for Head Up Games.This is a new class of games and as far as we are aware there are noexisting observation schemes that measure social interaction andphysical activity conjointly for outdoor, multiplayer games. In thatlight, OPOS must be viewed as a first attempt to do so, and we ar-gue that for measuring focus and physical activity OPOS is success-ful; though it is less so for measuring social interaction. However,we feel that this should be attributed to the way we have capturedthe data, not to the observation scheme itself.

Obtaining data, in our case video footage, is not as straightfor-ward as in laboratory-based evaluations. Players tend to disperseand move fast. Depending on the game and the play area multiplecameras could be used; alternatively individual players may betracked and coding play behaviors may then be done for individualplayers. In the evaluation of Save the safe and HeartBeat, some play-ers were equipped with wireless motion cameras that weremounted on helmets. This did allow us to capture the play behav-ior of one player for the whole duration of the game, in contrast tostationary cameras where children would sometimes disappearout of sight. Also, because the head mounted camera was muchcloser to the player it did a better job at capturing the ‘feel’ ofthe play. On the other hand, the footage it provided gave a limitedview of the playing field as it captures the perspective of a singleplayer only, compared to a stationary camera, which can capturemultiple players simultaneously. Sometimes the footage wasambiguous, as it was not always clear what the player was focusingat. Summarizing, there is no solution that is best for every situa-tion; we advice to select the type of camera (head-mounted or sta-tionary) taking into account the outdoor environment in which thegame is to be evaluated.

It is worth mentioning that applying OPOS is a very time-con-suming activity. The footage has to be reviewed for each playerindividually and for each player three classes of behavior need tobe coded. For example, a game of eight players of five minuteswould take approximately 8 � 3 � 5 = 120 min of coding.

Our experiences with applying OPOS have shown that it is rel-atively easy to code the classes ‘physical activity’ and ‘focus’. How-ever, in our studies of LightHouse and Save the safe, inter-raterreliability was low for the class ‘social interaction’. The main rea-son for this is that it was very hard to judge from the footage of sta-tionary cameras whether the children were engaged in socialinteraction, since the children’s faces were not properly discern-ible. Also, their verbal utterances were not very audible, becausethe children would often be too far away from the camera, making

it difficult to distinguish positive and negative interactions. As a di-rect consequence, the results for ‘social interaction’ are not reliable.In contrast, in the study of HeartBeat the inter-rater reliability ofcoding the class social interaction was higher. This is due to thefact that the footage was captured by the head mounted camerasand therefore much better recorded what the children were saying.We think that another way to improve capturing social interactionis to separately capture audio recordings for analysis. To this end,every player could be equipped with small audio recording devices.This would also solve the problem of children running in and out ofsight of the camera, at least for the class ‘social interaction’.

In related work on measuring physical activity another meansof data capturing that is often referred to is deploying accelerom-eters (e.g., [64]), though these can be expensive [30]. Using anaccelerometer would probably result in even more accurate mea-surements of the levels of physical activity, but this would comeat the cost of losing the rich contextual information that is pro-vided when using observational data [65]. Though data from accel-erometry would speed up the process of analyzing physical activitylevels, we would still argue to combine it with observational data,to correlate the levels of physical activity with interesting events inthe game play.

5.2. GroupSorter

Overall, we conclude that GroupSorter can provide valuablefeedback on the game during the design process of a game. Group-Sorter is specifically designed for interviewing groups of children,while avoiding common problems that are known to arise whendoing so. It provides both quantitative as well as qualitative feed-back: the quantitative result consists of a ranking of elements of agame on the construct ‘fun’. The most valuable data from Group-Sorter, though, is the qualitative data: the method elicits qualita-tive comments of children regarding the game play andexperience that designers can use to improve their product in thenext design cycle of the development. However, there are a num-ber of issues in general that need attention when applying thistechnique, which we will discuss here.

First, the items to be ranked must be picked carefully, since theywill largely influence the direction of the focus group discussion.From the analysis we conclude that the discussions will remaincentered around the topics provided to the children; new itemsdid not surface in the discussions.

In our evaluations we used icons as well as text to representeach item. This clarified the items to be ranked; from the focusgroup discussions we did not get the impression that childrenhad misunderstood or misinterpreted the items. We would liketo underline the importance of clear visualizations of items to beranked, especially when multiple items are alike and may be con-fused by the participants (such as in the LightHouse study, wherewe compared a technology-enhanced and a non-technical versionof the same game). Large cards may support the group discussionin such cases.

Though the use of items to guide the discussion will lessen theeffect of children giving ‘‘desired’’ answers [55,66], and no inter-view questions need to be prepared, still the moderator needs totake care that he prompts the children for explanations of theirrankings. Often, while the group is ranking, explanations are givenautomatically, but sometimes children would immediately agreeon an item, and no further details would be given. At such a mo-ment, the moderator prompted the children to elaborate on theirranking, to obtain more insight.

We had no trouble annotating the children’s discussion fromthe audio recording, although at some times children will talk atthe same time. A way to deal with this is to record each child’s

I. Soute et al. / Entertainment Computing 4 (2013) 25–38 37

input individually, by equipping each child with a personal micro-phone, as for example Sluis-Thiescheffer et al. have done [67].

A possible pitfall of a group based survey technique is that somechildren might be dominated by others, and their opinions mightbe unvoiced or dismissed quickly [68]. We also experienced thisin the LightHouse study, where we felt that peer pressure had influ-enced the focus group session. Clearly, more vocal and assertivechildren can influence the final group sort more but it is up tothe moderator to help all children voice their thoughts and feelthat their individual opinion is valued. For example, the moderatorcan address a child directly to invite him into the conversation,while silencing the more vocal kids. Also, as we expected, individ-ually ranking the items before the group discussion opens theopportunity to invite children into the discussion. By comparingtheir ranking to the group-ranking scheme, the moderator canask why a child answered differently. Finally, because childrenhave already made their ranking individually, there is less chanceof the effect called ‘‘cognitive tuning’’ [56], where children couldbe influenced by other children’s remarks, before making up theirown mind about the items. Particularly in the Save the safe andHeartBeat case study, we have experienced that referring to thechildren’s individual rankings during the focus group encourageschildren to formulate their own opinions.

6. Discussion and conclusion

In this paper we have described two methods that we founduseful when evaluating outdoor games. We have described OPOS,an observation scheme that quantifies outdoor play behaviors inwhich we are interested within the scope of HUGs. GroupSorteris a method for gathering subjective data from groups of children.In the previous section we have reflected on the use of both meth-ods in three case studies. In this section we discuss the applicabil-ity of the methods in other contexts and participants.

6.1. Applying the methods in other contexts

The methods described have been used to evaluate outdoorgames. With respect to the applicability of these methods outsidegame development we can say the following: as the observationscheme OPOS is a strongly linked with outdoor play, its use beyondplay or outdoor related applications is probably limited. In con-trast, the GroupSorter method is a much more generic methodand can be used for any kind of evaluation that involves multiplechildren. Also, as we have shown it generates both quantitativeand qualitative data, and – in case the data is in concordance, thequantitative data can substantiate the qualitative data, and viceversa, providing a more solid proof for the found results.

6.2. Applying the methods for other age groups

Both methods have been used for children in the age range of 7–13 years old, and have been found appropriate for this age group.However, since OPOS measures behavior only, without requiringdirect input from the users, we argue that it is equally usable forother ages, both younger and older. Finally, the GroupSorter meth-od is similar to a focus group, and we deduct from literature thatfocus groups can be successfully applied for ages six and up [54].

6.3. How to interpret the results of OPOS?

By using OPOS, objective quantitative data is gathered about theplay behavior that emerges during game play. However, oneshould keep in mind that in this case ‘‘more’’ is not always ‘‘better’’.For example, in our vision of HUGs we propagate the aspect of

physical activity in outdoor play. However, creating a game thatwill enforce players to physically exhaust themselves (and thusgenerating a high value on the ‘physical activity’ scale of OPOS)not automatically results in a game that is most fun – which is an-other focus point in the HUGs concept.

We created OPOS to objectively quantify the play behaviors wewant to encourage with Head Up Games. For this end, it has provento be successful, though improvements for data capturing are stillneeded: because of the manual analysis of the video material it isquite time consuming. Also, as we have experienced, it is not al-ways possible to capture all behaviors on tape, as children mightbe hidden or obscured by large objects in the play field. Therefore,we are looking into how the same information can be captured in adifferent way. For a next experiment we are planning to equip theplayer’s devices with accelerometers that log the player’s move-ments. Subsequently, this data can be analyzed for the level ofphysical activity using a mathematical package, which we expectto be much more time efficient.

6.4. Future work

We created OPOS because we could not find any other observa-tion scheme that suited our needs, and as pervasive outdoor gam-ing is a growing area of research, we argue that OPOS is a valuable,initial, contribution to the research community. At the moment,the difficulties with capturing social interaction are a limitationof the tool and remain an interesting topic for further research.In future work, we plan to apply OPOS to study what (interaction)elements of a game lead to what play behaviors, to create a funda-mental understanding of the game dynamics of outdoor play. Ulti-mately, our goal is to extend existing guidelines and designpatterns for games (see for example [69]) that mostly target com-puter games, with guidelines specifically for technology enhancedoutdoor play.

Acknowledgements

We would like to thank all the children and teachers that partic-ipated and cooperated in our studies. This research is supported bythe European Community under the IST program (FP6-2004-IST-4)Project PASION.

References

[1] S. Benford, A. Crabtree, M. Flintham, A. Drozd, R. Anastasi, M. Paxton, N.Tandavanitj, M. Adams, J. Row-Farr, Can you see me now?, ACM Transactionson Computer-Human Interaction 13 (1) (2006) 100–133

[2] N. Nova, F. Girardin, G. Molinari, P. Dillenbourg, The underwhelming effects ofautomatic location-awareness on collaboration in a pervasive game, in:International Conference on Cooperative Systems Design (COOP 2006), 2006,pp. 224–238.

[3] M. Bell, M. Chalmers, L. Barkhuus, M. Hall, S. Sherwood, P. Tennent, B. Brown,D. Rowland, S. Benford, Interweaving mobile games with everyday life, in:Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems, Montréal, Québec, Canada, 2006, pp. 417–426.

[4] D.A. Norman, S.W. Draper, User Centered System Design: New Perspectives onHuman-computer Interaction, 1st ed., CRC Press, 1986.

[5] M. Hassenzahl, User Experience and Experience Design, 2011, pp. 1–14.Available from: <Interaction-Design.org>.

[6] I. Soute, P. Markopoulos, Head Up Games: The games of the future will lookmore like the games of the past, in: Human-Computer Interaction – INTERACT2007, 2007, pp. 404–407.

[7] I. Soute, P. Markopoulos, R. Magielse, Head Up Games: combining the best ofboth worlds by merging traditional and digital play, Personal and UbiquitousComputing 14 (5) (2009) 435–444.

[8] W.G. Scarlett, S.C. Naudeau, D. Salonius-Pasternak, I.C. Ponte, Children’s Play,1st ed., Sage Publications, Inc., 2004.

[9] F. Mueller, S. Agamanolis, R. Picard, Exertion interfaces: sports over a distancefor social bonding and fun, in: Proceedings of the SIGCHI Conference on Humanfactors in Computing Systems, Ft. Lauderdale, Florida, USA, 2003, pp. 561–568.

[10] J. Soler-Adillon, N. Parés, Interactive slide: an interactive playground topromote physical activity and socialization of children, in: Proceedings of the

38 I. Soute et al. / Entertainment Computing 4 (2013) 25–38

27th International Conference Extended Abstracts on Human Factors inComputing Systems, Boston, MA, USA, 2009, pp. 2407–2416.

[11] T.M. Bekker, B.H. Eggen, Designing for children’s physical play, in: CHI ’08Extended Abstracts on Human factors in Computing Systems, Florence, Italy,2008, pp. 2871–2876.

[12] P. Markopoulos, J.C. Read, S. Macfarlane, J. Höysniemi, Evaluating Children’sInteractive Products: Principles and Practices for Interaction Designers,Morgan Kaufman, 2008.

[13] K. Salen, E. Zimmerman, Rules of Play: Game Design Fundamentals, illustrateded., The MIT Press, 2003.

[14] E. Mellou, Play theories: a contemporary review, Early Child Development andCare 102 (1) (1994) 91–100.

[15] M.B. Parten, Social participation among pre-school children, The Journal ofAbnormal and Social Psychology 27 (3) (1932) 243–269.

[16] K.H. Rubin, The Play Observation Scale (POS), Center for Children,Relationships and Culture, University of Maryland, 2001.

[17] K.H. Rubin, K.S. Watson, T.W. Jambor, Free-play behaviors in preschool andkindergarten children, Child Development 49 (2) (1978) 534.

[18] E.M. Hetherington, M. Cox, R. Cox, Play and social interaction in childrenfollowing divorce, Journal of Social Issues 35 (4) (1979) 26–49.

[19] P. Metin, The Effects of Traditional Playground Equipment Design in Children’sDevelopmental Needs, Middle East Technical University, 2003.

[20] J. Huizinga, Homo Ludens: A study of the Play-Element in Culture, Beaon Press,1955.

[21] R. Caillois, M. Barash, Man, play, and games, University of Illinois Press, 2001.[22] C.A. Anderson, N.L. Carnagey, Causal effects of violent sports video games on

aggression: is it competitiveness or violent content?, Journal of ExperimentalSocial Psychology 45 (4) (2009) 731–739

[23] J.L. Sherry, The effects of violent video games on aggression, HumanCommunication Research 27 (3) (2001) 409–431.

[24] J. Kirriemuir, A. McFarlane, Literature review in games and learning, Report 8.Bristol, Nesta Futurelab, 2004.

[25] J. Verhaegh, W. Fontijn, E. Aarts, W. Resing, ‘‘In-game assessment and trainingof nonverbal cognitive skills using TagTiles, ’’ Personal and UbiquitousComputing, 2012, pp. 1–10.

[26] A.J. Parkes, H.S. Raffle, H. Ishii, Topobo in the wild: longitudinal evaluations ofeducators appropriating a tangible interface, in: Proceedings of the Twenty-Sixth Annual SIGCHI Conference on Human factors in Computing Systems,New York, NY, USA, 2008, pp. 1129–1138.

[27] Y.A.W. de Kort, W.A. IJsselsteijn, K. Poels, Digital games as social presencetechnology: development of the social presence in gaming questionnaire(SPGQ), in: Proceedings of PRESENCE 2007: The 10th International Workshopon Presence, 2007, pp. 195–203.

[28] V. Chen, H. Duh, P. Phuah, D. Lam, ‘‘Enjoyment or engagement? role of socialinteraction in playing massively mulitplayer online role-playing games(MMORPGS)’’, in: R. Harper, M. Rauterberg, M. Combetto (Eds.),Entertainment Computing – ICEC 2006, vol. 4161, Springer, Berlin/Heidelberg, 2006, pp. 262–267.

[29] W. Barendregt, M.M. Bekker, E. Baauw, Development and evaluation of theproblem identification picture cards method, Cognition, Technology & Work10 (2) (2007) 95–105.

[30] J. Dollman, A.D. Okely, L. Hardy, A. Timperio, J. Salmon, A.P. Hills, A hitchhiker’sguide to assessing young people’s physical activity: deciding what method touse, Journal of Science and Medicine in Sport 12 (5) (2009) 518–525.

[31] E. Haug, T. Torsheim, J.F. Sallis, O. Samdal, The characteristics of the outdoorschool environment associated with physical activity, Health EducationResearch 25 (2) (2010) 248–256.

[32] T.L. McKenzie, S.J. Marshall, J.F. Sallis, T.L. Conway, Leisure-time physicalactivity in school environments: an observational study using SOPLAY,Preventive Medicine 30 (1) (2000) 70–77.

[33] F. Lantz, PacManhattan, in: F. Borries, S.P. Walz, M. Böttger (Eds.), Space TimePlay, Birkhäuser Basel, 2007, pp. 262–263.

[34] S. Benford, D. Rowland, M. Flintham, A. Drozd, R. Hull, J. Reid, J. Morrison, K.Facer, Life on the edge: supporting collaboration in location-basedexperiences, in: Proceedings of CHI ’05, Portland, Oregon, USA, 2005, pp.721–730.

[35] Y. Rogers, M. Scaife, E. Harris, T. Phelps, S. Price, H. Smith, H. Muller, C. Randell,A. Moss, I. Taylor, D. Stanton, C. O’Malley, G. Corke, S. Gabrielli, Things aren’twhat they seem to be: innovation through technology inspiration, in:Proceedings of the 4th Conference on Designing Interactive Systems:Processes, Practices, Methods, and Techniques, London, England, 2002, pp.373–378.

[36] T. Bekker, J. Sturm, B. Eggen, Designing playful interactions for socialinteraction and physical play, Personal and Ubiquitous Computing 14 (5)(2009) 385–396.

[37] T. Bekker, J. Sturm, Stimulating physical and social activity through open-ended play, in: Proceedings of the 8th International Conference on InteractionDesign and Children, New York, NY, USA, 2009, pp. 309–312.

[38] S.E. Lindley, J. Le Couteur, N.L. Berthouze, Stirring up experience throughmovement in game play: effects on engagement and social behaviour, in:Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human factorsin Computing Systems, New York, NY, USA, 2008, pp. 511–514.

[39] M.T. Boren, Conducting verbal protocols in usability testing: theory andpractice, Univ. Washington, Seattle, 1999.

[40] J.S. Dumas, J. Redish, A Practical Guide to Usability Testing, Intellect Books,1999.

[41] J. Rubin, D. Chisnell, Handbook of Usability Testing: How to Plan, Design, andConduct Effective Tests, John Wiley & Sons, 2008.

[42] S. McDonald, H.M. Edwards, T. Zhao, exploring think-alouds in usabilitytesting: an international survey, IEEE Transactions on ProfessionalCommunication 55 (1) (2012) 2–19.

[43] T. Boren, J. Ramey, Thinking aloud: reconciling theory and practice, IEEETransactions on Professional Communication 43 (3) (2000) 261–278.

[44] A. Donker, P. Markopoulos, A comparison of think-aloud, questionnaires andinterviews for testing usability with children, in: People and Computers, 2002,pp. 305–316.

[45] K. Jegers, Investigating the applicability of usability and playability heuristicsfor evaluation of pervasive games, in: Internet and Web Applications andServices, 2008. ICIW ’08. Third International Conference on, 2008, pp. 656–661.

[46] T. Fullerton, C. Swain, S. Hoffman, Game Design Workshop, Focal Press, 2004.[47] A.P.O.S. Vermeeren, K. den Bouwmeester, J. Aasman, H. de Ridder, DEVAN: a

tool for detailed video analysis of user test data, Behaviour & InformationTechnology 21 (6) (2002) 403–423.

[48] W. Barendregt, M.M. Bekker, Developing a coding scheme for detectingusability and fun problems in computer games for young children, BehaviorResearch Methods 38 (3) (2006) 382–389.

[49] J.C. Read, Validating the fun toolkit: an instrument for measuring children’sopinions of technology, Cognition, Technology & Work 10 (2) (2008) 119–128.

[50] J.C. Read, S. Macfarlane, Using the fun toolkit and other survey methods togather opinions in child computer interaction, in; Proceedings of IDC ’06,Tampere, Finland, 2006, pp. 81–88.

[51] Y. Rogers, S. Price, G. Fitzpatrick, R. Fleck, E. Harris, H. Smith, C. Randell, H.Muller, C. O’Malley, D. Stanton, M. Thompson, M. Weal, Ambient wood:designing new forms of digital augmentation for learning outdoors, in:Proceedings of IDC ’04, Maryland, 2004, pp. 3–10.

[52] S. Bakker, P. Markopoulos, Y. de Kort, OPOS: an observation scheme forevaluating head-up play, in: Proceedings of NordiCHI ’08, Lund, Sweden, 2008,pp. 33–42.

[53] R.A. Krueger, M.A. Casey, Focus Groups a Practical Guide for Applied Research,SAGE, 2000.

[54] E. Hennessy, C. Heary, Exploring children’s views through focus groups, in:Researching Children’s Experience: Approaches and Methods, SagePublications Ltd., 2005, pp. 236–252.

[55] D. Eder, L. Fingerson, Interviewing children and adolescents, in: J.F. Gubrium,J.A. Holstein (Eds.), Handbook of Interview Research: Context & Method,SAGE, 2002.

[56] E.F. Fern, Advanced Focus Group Research, Sage Publications Inc., 2001.[57] L. Hanna, D. Neapolitan, K. Risden, Evaluating computer game concepts with

children, in: Proceedings of the 2004 Conference on Interaction Design andChildren: Building a Community, Maryland, 2004, pp. 49–56.

[58] H.-F. Hsieh, S.E. Shannon, Three approaches to qualitative content analysis,Qualitative Health Research 15 (9) (2005) 1277–1288.

[59] J.R. Landis, G.G. Koch, The measurement of observer agreement for categoricaldata, Biometrics 33 (1) (1977) 159–174.

[60] K. Hendrix, G. Yang, D. van de Mortel, T. Tijs, P. Markopoulos, Designing ahead-up game for children, in: Proceedings of the HCI’08 Conference on Peopleand Computers XXII, 2008, vol. 1, pp. 45–53.

[61] I. Soute, M. Kaptein, P. Markopoulos, Evaluating outdoor play for children:virtual vs. tangible game objects in pervasive games, in: Proceedings of the 8thInternational Conference on Interaction Design and Children, Como, Italy,2009, pp. 250–253.

[62] K. Poels, W.A. IJsselsteijn, Y.A. Kort, Development of the kids game experiencequestionnaire: a self report instrument to assess digital game experiences inchildren, in: Poster Presentation at the Meaningful Play conference inMichigan, 2008.

[63] R. Magielse, P. Markopoulos, HeartBeat: an outdoor pervasive game forchildren, in: Proceedings of the 27th International Conference on HumanFactors in Computing Systems, Boston, MA, USA, 2009, pp. 2181–2184.

[64] A. Leal Penados, M. Gielen, P.-J. Stappers, T. Jongert, Get up and move: aninteractive cuddly toy that stimulates physical activity, Personal andUbiquitous Computing 14 (5) (2009) 397–406.

[65] B.J. Loprinzi, P.D.a. Cardinal, Measuring children’s physical activity andsedentary behaviors, Journal of Exercise Science and Fitness 9 (1) (2011) 15–23.

[66] J. Garbarino, F.M. Stott, What Children Can Tell Us: Eliciting, Interpreting, andEvaluating Critical Information from Children, Jossey-Bass, 1992.

[67] W. Sluis-Thiescheffer, T. Bekker, B. Eggen, Comparing early design methods forchildren, in: Proceedings of the 6th International Conference on InteractionDesign and Children, Aalborg, Denmark, 2007, pp. 17–24.

[68] A. Lewis, Group child interviews as a research tool, British EducationalResearch Journal 18 (4) (1992) 413.

[69] S. Bjork, J. Holopainen, Patterns in Game Design (Game Development Series),1st ed., Charles River Media, 2004.