visual scenemaker—a tool for authoring interactive virtual characters

1 23

Journal on Multimodal UserInterfaces ISSN 1783-7677Volume 6Combined 1-2 J Multimodal User Interfaces (2012)6:3-11DOI 10.1007/s12193-011-0077-1

Visual SceneMaker—a tool for authoringinteractive virtual characters

Patrick Gebhard, Gregor Mehlmann &Michael Kipp

1 23

Your article is protected by copyright and all

rights are held exclusively by OpenInterface

Association. This e-offprint is for personal

use only and shall not be self-archived in

electronic repositories. If you wish to self-

archive your work, please use the accepted

author’s version for posting to your own

website or your institution’s repository. You

may further deposit the accepted author’s

version on a funder’s repository at a funder’s

request, provided it is not made publicly

available until 12 months after publication.

J Multimodal User Interfaces (2012) 6:3–11DOI 10.1007/s12193-011-0077-1

O R I G I NA L PA P E R

Visual SceneMaker—a tool for authoring interactive virtualcharacters

Patrick Gebhard · Gregor Mehlmann · Michael Kipp

Received: 3 December 2010 / Accepted: 5 November 2011 / Published online: 17 December 2011© OpenInterface Association 2011

Abstract Creating interactive applications with multiplevirtual characters comes along with many challenges thatare related to different areas of expertise. The definition ofcontext-sensitive interactive behavior requires expert pro-grammers and often results in hard-to-maintain code. Totackle these challenges, we suggest a visual authoring ap-proach for virtual character applications and present a re-vised version of our SceneMaker tool. In SceneMaker aseparation of content and logic is enforced. In the revisedversion, the Visual SceneMaker, we introduce concurrencyand specific history structures as key concepts to facili-tate (1) clearly structured interactive behavior definition,(2) multiple character modeling, and (3) extensions to ex-isting applications. The new integrated developer environ-ment allows sceneflow visualization and runtime modifica-tions to support the development of interactive character ap-plications in a rapid prototyping style. Finally, we presentthe result of a user study, which evaluates usability and thekey concepts of the authoring tool.

Keywords Embodied conversational agents · Authoringtools · Modelling of interactive behavior

1 Introduction

The creation of believable interactive virtual characters in-volves expertise in many disciplines. It is a creative actwhich starts with application specific goals that are real-ized by the choice of appropriate models, techniques andcontent, put together in order to create an overall consistent

P. Gebhard (�) · G. Mehlmann · M. KippDFKI GmbH, Saarbrücken, Germanye-mail: [email protected]

application. Virtual characters can enrich the interaction ex-perience by showing engaging and consistent behavior. Thiscomes along with a whole range of challenges, such as in-teraction design, emotion modeling, figure animation, andspeech synthesis [9]. Research is carried out in a range ofdisciplines, including believable facial expressions, gestureanimations and body movements of virtual characters [14],modeling of personality and emotion [10] and expressivespeech synthesis [21]. However, to what extent virtual char-acters actually contribute to measurable benefits such as in-creased motivation or even performance is still hotly debated(cf. [12, 18]). This makes it even more important that virtualcharacter applications are carefully designed, in close inter-action with users, artists, and programmers.

Over the past years, several approaches for modeling thecontent and the interactive behavior of virtual character haveemerged. First, plan- and rule-based systems were in thefocus of interest. Then, authoring systems were developedto exploit related expert knowledge in the areas of film ortheater screenplay (e.g. dramatic acting and presentation).These systems are created to facilitate the authoring processand should enable non-computer experts to model believ-able natural behavior. One of the first authoring systems forinteractive character applications is Improv [19]. Its behav-ior engine allows authors to create sophisticated rules defin-ing how characters communicate, change their behavior, andmake decisions. The system uses an “English-style” script-ing language to define individual scripts. SCREAM [20]is a scripting tool to model story aspects as well as high-level interaction and comprises modules for emotion gen-eration, regulation, and expression. Compared to our initialSceneMaker approach [8] which uses an author-centric ap-proach with the primary focus of scripting at the story/plotlevel, the related projects use a character-centric approachin which the author defines a character’s goals, beliefs, and

Author's personal copy

mailto:[email protected]

https://www.researchgate.net/publication/224088104_Assessing_the_validity_of_appraisal-based_models_of_emotion?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/221588378_The_Persona_Zero-Effect_Evaluating_Virtual_Character_Benefits_on_a_Learning_Task_with_Repeated_Interactions?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/221588498_Towards_Natural_Gesture_Synthesis_Evaluating_Gesture_Units_in_a_Data-Driven_Approach_to_Gesture_Synthesis?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/3454031_Creating_Interactive_Virtual_Humans_Some_Assembly_Required?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/2457094_Improv_A_System_for_Scripting_Interactive_Actors?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/284052710_Approaches_to_emotional_expressivity_in_synthetic_speech?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/240125739_MPML_and_SCREAM_Scripting_the_Bodies_and_Minds_of_Life-Like_Characters?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/245560823_Authoring_Scenes_for_Adaptive_Interactive_Performances?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

4 J Multimodal User Interfaces (2012) 6:3–11

Fig. 1 The English examplescene Welcome with a variable,gesture and system commands

attitudes. These mental states determine the behavioral re-sponses to the annotated communicative acts it receives.SCREAM is intended as a plug-in to task specific agent sys-tems such as interactive tutoring or entertainment systemsfor which it can decide on the kind of emotion expressionand its intensity. There is no explicit support for scriptingthe behavior of multiple agents in a simple and intuitiveway. Other author-centric approaches are the CSLU toolkit[17], SceneJo [22], DEAL [5], and CREACTOR [13]. TheCSLU toolkit relies on a finite state approach for dialoguecontrol while Scenejo relies on the A.L.I.C.E. chatbot tech-nology [1] for the control of verbal dialogues which offersaccessibility to non-experts in interactive dialogue program-ming. This technology offers AIML structures, which canbe used to define textual dialogue contributions. A graphi-cal user interface is provided for the modeling of structuredconversations. DEAL is created as a research platform forexploring the challenges and potential benefits of combiningelements from computer games, dialogue systems and lan-guage learning. A central concept is the dialogue manager,which controls the dialogues of game characters. The dia-logue manager employs Harel statecharts [11] to describean application’s dialogue structure and to model differentaspects of dialogue behavior (e.g. conversational gesturesand emotional expressions). CREACTOR is an authoringframework for virtual characters that enables authors to de-fine various aspects of an interactive performance throughan assisted authoring procedure. The project focuses on themodeling of consistent narration and character behavior us-ing a graph representation. None of the mentioned author-ing systems support concepts for action history and con-currency on the authoring level. However, DEAL providesthese concepts, but only at the level of statecharts. With theVisual SceneMaker, new authoring concepts are introduced:concurrency, variable scoping, multiple interaction policies,a run-time history, and a new graphical integrated develop-ment environment (IDE). Before we lay the focus on these,we give a brief overview on the existing concepts.

2 SceneMaker authoring concepts

SceneMaker’s central authoring paradigm is the separationof content (e.g. dialogue) and logic (e.g. conditional branch-ing). The content is organized as a collection of sceneswhich are specified in a multi-modal scenescript resembling

a movie script with dialogue utterances and stage directionsfor controlling gestures, postures, and facial expressions.The logic of an interactive performance and the interactionwith virtual characters is controlled by a sceneflow.

Scenescript A scene definition (see Fig. 1) starts withthe keyword Scene, followed by a language key (e.g.en for English), a scene identifier (e.g. Welcome), and anumber of variables (e.g. $name). Variables can be usedto produce variations or to create personalized versions.The scene body holds a number of turns and utterances,which have to start with a character’s name (e.g. Samand Max). Utterances may hold variables, system actions(e.g. [camera upperbody]) and gesture commands(e.g.[point_to_user]).

A scenescript may provide a number of variations foreach scene that are subsumed in a scenegroup. Different pre-defined and user-defined blacklisting strategies can be usedto choose one scene of a scenegroup for execution. This se-lection mechanism increases dialogue variety and helps toavoid repetitive behavior of virtual characters.

Sceneflow A sceneflow is a hierarchical statechart variantspecifying the logic and temporal order in which individ-ual scenes are played, commands are executed and user in-teractions are processed. It consists of different types ofnodes and edges. A scenenode (see Fig. 2 1©) has a nameand an unique identifier. It holds scenegroup playback com-mands or statements specified in a simple scripting languageformat. These are type definitions and variable definitions,variable assignments, and function calls to predefined func-tions of the underlying implementation language (e.g. Java™

functions). A supernode (see Fig. 2 2©) extends the func-tionality of scenenodes by creating a hierarchical structure.A supernode may contain scenenodes and supernodes thatconstitute its subautomaton (marked graphically with theshaded area)1 with one defined startnode (see Fig. 2 3©).The supernode hierarchy can be used for a scoping of typesand variables. Type definitions and variable definitions areinherited to all subnodes of a supernode.

Nodes can be seen as little code pieces that structurethe content of an interactive presentation while edges spec-ify how this content is linked together. Different branching

1Within the IDE, the content of a supernode can be edited by selectinga supernode.


https://www.researchgate.net/publication/228621365_DEAL_-_Dialogue_management_in_SCXML_for_believable_game_characters?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/2544466_Using_the_CSLU_toolkit_for_practicals_in_spoken_dialogue_technology?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/225160278_Towards_Accessible_Authoring_Tools_for_Interactive_Storytelling?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/262726678_Statecharts_A_Visual_Formalism_For_Complex_Systems?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

J Multimodal User Interfaces (2012) 6:3–11 5

Fig. 2 Basic node types andedge types of sceneflows and thecolor code of nodes and edges

strategies within the sceneflow, e.g. logical and temporalconditions or randomization, as well as different interactionpolicies, can be modelled by connecting nodes with differ-ent types of edges [8]. The IDE supports the usages of edges.The first edge connected to a node defines the allowed typesof edges and the color of that node. This color code helpsauthors structuring a sceneflow and helps keeping it well-defined.

Epsilon edges represent unconditional transitions (seeFig. 2 4©) and specify the order in which steps are performedand scenes are played back.

Timeout edges represent scheduled transitions and are la-beled with a timeout value (see Fig. 2 5©) influecing the tem-poral flow of a sceneflow’s execution.

Probabilistic edges represent transition that are takenwith certain probabilities and are labeled with probabilityvalues (see Fig. 2 6©). They are used to create randomnessand variability during the execution.

Conditional edges represent conditional transitions andare labeled with conditional expressions (see Fig. 2 7©+ 8©).They are used to create a branching structure in the scene-flow which describes different reactions to changes of envi-ronmental conditions, external events, or user interactions.

Interuptive edges are special conditional edges that arerestricted to supernodes (see Fig. 2 9©) and are used to inter-rupt ongoing activities.

Discussion The focus of the original SceneMaker is toenable authors to model typical interactive situations (e.g.with dialogue variations and distinct character behavior).This might be sufficient for some simple application setups,but there are limitations with regard to a more sophisticatedmodeling of dialogue, behavior of virtual characters, and in-teraction:

1. Difficult dialogue resumption. An external dialoguememory is needed to represent and store the current stateof the dialogue topic or interaction. Without a dialoguememory coherent dialogue resumptions are not feasible.

2. Coupled control of virtual characters. An individual con-trol and synchronization of multiple characters with sep-arate sceneflows is not supported. With scene gesturecommands multiple characters could be animated at thesame time, but not independent of each other.

3. Costly extension of virtual character behavior. Addingnew behavioral aspects, e.g. a specific gaze behavior in amulti-character application requires to add commands ineach used scene for each character.

3 The Visual SceneMaker IDE

The Visual SceneMaker IDE, shown in Fig. 3, enables au-thors to create, maintain, debug, and execute an applicationvia graphical interface controls.

The IDE displays two working areas in the middle of thewindow, the sceneflow editor (see Fig. 3 A©) and the scene-script editor (see Fig. 3 B©). In order to build an application,authors can drag building blocks from the left side and dropthem on the respective editor. Building blocks for scene-flows (see Fig. 3 C©) are nodes, edges, scenes, and prede-fined Java™ functions. Building blocks for scenescripts (seeFig. 3 D©) are gestures, animations and system actions. Allproperties of nodes, such as type- and variable definitions aswell as function calls, can be modified right to the sceneflowworking area (see Fig. 3 E©). All defined types and variablesare shown in a separate variable badge (see Fig. 3 1©) withinthe sceneflow editor. To facilitate maintaining and testing,the execution of an application can be visualized either byhighlighting the currently executed nodes, edges, and scenesred at runtime (see Fig. 3 2©) or by highlighting all past ac-tions and execution paths gray (see Fig. 3 3©).

The Visual SceneMaker is implemented in Java™ us-ing an interpreter approach for executing sceneflows andscenes. This allows the modification of the model and thedirect observation of the ensuing effects at runtime.

Figure 4 shows the system architecture, with the IDE inthe modeling environment (see Fig. 4 A©) as well as the en-vironments for the interpreter (see Fig. 4 B©), plug-ins (seeFig. 4 C©), and applications (see Fig. 4 D©). Sceneplayersand Plug-Ins represent the input and output interfaces toexternal applications and have to be implemented by pro-grammers. The figure presented a typical application con-figuration showing the Mary TTS2 for speech output (see

2http://mary.dfki.de


http://mary.dfki.de



Fig. 3 The Visual SceneMakerIDE showing different workingareas and highlighting modes

Fig. 4 The SceneMaker’ssystem architecture with itssceneplayer and plug-inmechanism

Fig. 4 3©) and the Horde 3D3 graphics engine for charac-ter rendering (see Fig. 4 2©) as well as the Spin [7] parserfor NLU (see Fig. 4 4©). The Horde3D Sceneplayer (seeFig. 4 1©) is responsible for the translation of scenes to TTSand animation commands. The Spin Request (see Fig. 4 5©)forwards user input events to the sceneflow interpreter bychanging the value of global sceneflow variables.

4 New Sceneflow elements

We extended the definition of sceneflows with concepts for ahierarchical and parallel decomposition of sceneflows, dif-

3http://horde3d.org

ferent interruption policies and a runtime history adoptingand extending modeling concepts as they can be found inseveral statechart variants [3, 11].

Hierarchical and parallel decomposition By the use of ahierarchical and parallel model decomposition, multiple vir-tual characters and their behavioral aspects, as well as multi-ple control processes for event and interaction management,can be modeled as concurrent processes in separate paral-lel automata. Thus, individual behavioral aspect and con-trol processes can be modeled and adapted in isolation with-out knowing details of the other aspects while previouslymodeled behavioral patterns can easily be reused and ex-tended.

Sceneflows provide two instruments allowing an authorto create multiple concurrent processes at runtime: (1) By


http://horde3d.org


https://www.researchgate.net/publication/221490258_Robust_and_efficient_semantic_parsing_of_free_word_order_languages_in_spoken_dialogue_systems?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5


Fig. 5 1© The hierarchical andparallel decomposition of thevirtual world with multiplestartnodes and 2© the paralleldecomposition of behavioralaspects with multiple fork edges

Fig. 6 1© Processsynchronization over thevariable sync and 2© over aconfiguration query

Fig. 7 1© Interruptive andordinary conditional edges in asupernode hierarchy as well as2© an example of the use of a

history node and a historycondition for dialogueresumption

defining multiple startnodes, and thus, multiple parallel sub-automata for a supernode (see Fig. 5 1©) and (2) by usingfork edges (see Fig. 5 2©), each of which creating a new pro-cess.

Communication and synchronization Individual behav-ioral aspects contributing to the behavior of a virtual char-acter and the processing of events for interaction manage-ment are usually not completely independent. For example,speech is usually highly synchronized with non-verbal be-havioral modalities like gestures and body postures. Whenmodeling these aspects in parallel automata, the processesthat concurrently execute these automata have to be syn-chronized by the author in order to coordinate them all witheach other.

Sceneflows provide two instruments for an asynchronousnon-blocking synchronization of concurrent processes, real-ized by a shared memory model. They allow the synchro-nization over (1) shared variables defined in the scope ofsome common supernode (see Fig. 6 1©) and (2) a more in-tuitive mechanism for process synchronization, a state query(see Fig. 6 2©) which allows to test if a certain state is cur-rently executed by another concurrent process.

Interaction handling with interruption policies User inter-actions and environmental events can rise at any time dur-ing the execution of a model. Some of them need to be pro-cessed as fast as possible to assert certain real-time require-ments, for example when interrupting a character’s utterancein order to give the user the impression of presence. Otherevents may be processed at a later point in time allowing cur-rently executed scenes or commands to be regularly termi-nated. These two interaction handling strategies are realizedin sceneflows with the two types of conditional edges, hav-ing different interruption- and inheritance policies. In con-trast to ordinary conditional edges, interruptive conditionaledges directly interrupt the execution of any processes, andhave priority over any other edges, lower in the supernodehierarchy and, thus, are used for handling events and userinteractions requiring a prompt reaction.

Figure 7 1© shows ordinary and interruptive conditionaledges in a supernode hierarchy. When condition stopcomes true during the execution of the innermost scene play-back command, then the outer interruptive edge to the outerend node is taken immediately, whereas the inner ordinaryconditional edge is taken after the playback command hasbeen terminated and the outer ordinary conditional edge istaken only after the inner end node has terminated.



Table 1 Feature comparisonbetween the SceneMaker andthe new Visual SceneMaker

Dialogue resumption with runtime history During a scene-flow’s execution, the interpreter maintains a runtime historyto record the runtimes of nodes, the values of local variables,executed system commands, and scenes that were playedback as well as nodes that were lastly executed. The auto-matic maintenance of this history memory releases the au-thor of the manual collection of data, thus reducing the mod-eling effort while increasing the clarity of the model andproviding the author with rich information about previousinteractions and states of execution. This facilitates the mod-eling of reopening strategies and recapitulation phases fordialogues. The history concept is realized graphically as aspecial history node which is a fixed child node of each su-pernode. When re-executing a supernode, the history nodeis executed instead of its default startnodes. Thus, the his-tory node serves as a starting point for authors to modelreopening strategies or recapitulation phases. The scriptinglanguage of sceneflows provides a variety of expressions andconditions to request the information from the history mem-ory or to delete it. Figure 7 2© shows the use of the historynode of supernode S1 and the history condition HCS(S1,S3) used to find out if node S3 was the last executed subn-ode of S1. If the supernode S1 is interrupted and afterwardsre-executed, then it starts at its history node N1. If it hadbeen interrupted during the explanation phase in nodes S3and S5, then the explanation phase is reopened, otherwisethe welcome phase in nodes S2 and S4 is restarted.

Feature comparison The introduced new sceneflow ele-ments support a more flexible creation of interactive appli-cations with multiple virtual characters. Table 1 comparesthe original and the new version of the authoring tool.

With the new features authors are now able to modelmultiple virtual characters individually with separate scene-flows, which allow a fine-grained synchronization. More-over, the use of parallel sceneflow elements allow mod-eling behavioral aspects with theses structures. While theold SceneMaker version supports an animation of charac-ters through scene gesture commands (which probably haveto be used in each scene, e.g. gaze behavior), the new ver-sion could process these commands in an additional paral-lel sceneflow. In addition, this enables an effortless generalextension to existing SceneMaker applications. The task ex-tending an application, by e.g. camera movements that focus

on the current speaker, could easily fulfilled by modeling anew parallel sceneflow that executes the respective cameracommands. Overall, this helps reducing modeling costs inseveral situations. The new history node structure liberatesauthors from the use of external dialogue- and context mem-ories, which often come with their own interface language.By providing a simple integrated dialogue history memory,a fine-grained handling of dialogue resumption and dialogueinterruption is feasible with “on-board” equipment. Finally,we belief that using the mentioned concepts in a visual pro-gramming paradigm together with an IDE that supports themanipulation and the execution of SceneMaker applicationsresults in a powerful, easy to handle approach for the cre-ation of interactive multiple virtual character applications.

5 Evaluation

At first, SceneMaker has been informally evaluated in twofield tests (at the Nano-Camp 2009 and the Girls’ Day 2010[6]) with 21 secondary and high school students at the ageof 12 to 16. Overall, the students describe SceneMaker as aneasy and an intuitive tool, which supports a rapid creation ofapplications with interactive virtual characters.

We evaluated the Visual SceneMaker with a user studywith 19 subjects aged 23 to 41 (average age is 30.89). Thesubject group consisted of 14 males and 5 females, 9 of thesubjects were university students and 10 of them were re-searchers from multimedia and computer science. The sub-jects had to model an interactive sales conversation betweena user and two embodied conversational agents. The par-ticipants were shortly introduced to the software and weregiven a textual description of the expected dialogue struc-ture. Parts of the dialogue were pre-modeled and served asan orientation point for the participants to model the remain-ing dialogue without assistance. The dialogue specificationrequired the employment of the new modeling concepts,such as different interruption policies, the history conceptas well as hierarchical refinement and parallel decomposi-tion. The modeling sessions lasted from 24 to 67 minuteswith an average of 40.05 minutes. Afterwards the partici-pants filled out a questionnaire on their previous knowledgeand several five-point Likert scale questionnaires addressingVisual SceneMaker’s modeling concepts and system usabil-ity. We used questionnaires (see Fig. 8) Q1 and Q2 for the



Fig. 8 The questionnaires ofthe evaluation sheet and theresults of the user study

Fig. 9 The subject groupcomparisons with respect totheir previous knowledge

assessement of SceneMaker’s modeling concepts as well asQ3 and Q4 for measuring the system usability scale (SUS)from [4], to get a view of subjective assessement of gen-eral system usability. The SUS covers a variety of aspects ofsystem usability, such as the need for support, training, andcomplexity and has been proven a very effective means tomeasure usability in a huge number of usability test. For ex-ample, Bangor and colleagues [2] applied it to a large rangeof user interfaces in nearly 3500 surveys with 273 usabil-ity studies. The SUS helps us to interpret the results of ourexperiment by comparing the scores of our study with thescores reported in those studies.

The results of the descriptive analysis of the question-naire are shown in Fig. 8. To show that the mean ratingvalue was significantly above the neutral value, we appliedt-tests for one sample. These showed, that the participantsgot along very well with dialogue modeling and interaction

(t (18) = 21.02, p < 0.0005), multimodal dialogue creation(t (18) = 11.56, p < 0.0005), testing and visualization fea-tures (t (18) = 13.91, p < 0.0005) and overall system us-ability (t (18) = 44.04, p < 0.0005). Furthermore, we com-pared different subject and experience groups. Rating com-parisons of male subjects to female subjects and students toresearchers as well as comparisons of different experiencegroups with respect to modeling time and age revealed nosignificant differences.

The results of the experience group comparisons, shownin Fig. 9, revealed that subjects with high experience inmodel oriented programming (N : 15, MQ3: 4.73, SDQ3:0.29, MQ4: 84.17, SDQ4: 6.99) gave slightly better scoresfor Q3 and Q4 than those with low experience (N : 4,MQ3: 4.00, SDQ3: 0.72, MQ4: 74.38, SDQ4: 6.25; tQ3(17) =3.26, pQ3 = 0.005, tQ4(17) = 2.54,pQ4 = 0.021). This wasalso observed for Q4 in the comparison of subjects with


https://www.researchgate.net/publication/228368593_Determining_what_individual_SUS_Scores_mean_adding_and_Adjective_Rating_Scale?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/228593520_SUS_a_Quick_and_Dirty_Usability_Scale?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5


high overall experience (N : 7, M : 87.50, SD: 5.95) tosubjects with low overall experience (N : 12, M : 78.95,SD: 7.19; t (17) = 2.65, p = 0.017). Another compari-son showed, that subjects with SUS score below average(N : 7, M : 49.43 min, SD: 12.15 min) needed more timeto finish the task than subjects with a high SUS score(N : 12, M : 34.58 min, SD: 7.14 min; t (17) = −3.38,p = 0.004).

Although, subjects with less programming experienceneeded more time and gave slightly worse system usabil-ity scores, the overall results show that the users gener-ally felt very confident with the tool. Generally, the SUSscore obtained was very promising. Regarding the averageresults of around 70 on the SUS scale from 0 to 100 thatBangor and colleagues [2] obtained for a large variety ofuser interfaces they evaluated, Visual SceneMaker’s perfor-mance (Min: 67.60, Max: 97.50, M : 82.11, SD: 7.62) is wellabove the reported average values and may be interpretedaccording to their usability classification as nearly excel-lent.

6 Summary and future work

In this paper, we have presented Visual SceneMaker, anauthoring tool for the creation of interactive applicationswith multiple virtual characters. It extents the initial versionproviding creative, non-programming experts with a simplescripting language for the creation of rich and compellingcontent. The separation of dialogue content and narrativestructure based on scenes and sceneflows allows an inde-pendent modification of both.

A powerful feature for the creation of new interactive per-formances in a rapid prototyping style is the reusability ofsceneflow parts. This is explicitly supported by the scene-flow editor, which is part of the new Visual SceneMakerIDE. This graphical authoring tool supports authors withdrag’n’drop facilities to draw a sceneflow by creating nodesand edges. The editor also supports new structural exten-sions of the sceneflow model. The improvements compriseconcurrent sceneflows and a special history node type. Withthe use of concurrent sceneflows, large sceneflows can bedecomposed into logical and conceptual components, whichreduces the amount of structures but also helps to organizethe entire model of an interactive performance. In addition,concurrent structures can be used to model reactive behav-ioral aspects (e.g. gaze). This will reduce the effort in cre-ating pre-scripted scenes. History nodes are introduced formore flexible modeling of reopening strategies and recapitu-lation phases in dialogue. This new sceneflow structure pro-vides access to the plot history in every node. This is a valu-able help for the creation of non-static and lively interactiveperformances.

Visual SceneMaker IDE is build upon an interpreter forthe execution of sceneflows. The graphical authoring toolallows the visualization of the execution progress of a givenSceneMaker application. In addition, it provides an immedi-ate observation of effects caused by the modification of themodel even at runtime. This is a crucial help when debug-ging and testing a model in early development phases and inrefinement and modification phases.

We tested the flexibility of the Visual SceneMaker ap-proach first in field tests and second in a user study. As aresult, we found that the visual programming approach andthe provided modeling structures are easily comprehensibleand let non-experts create virtual character applications in arapid prototype fashion. This conclusion is especially justi-fied by the excellent usability scores that were reached in theuser study.

For future work, we plan to integrate the Visual Scene-Maker with other crucial components (speech synthesis,nonverbal behavior generation, emotion simulation . . .) withplan to integrate it into a component-based embodied agentframework [15]. This implies the use of standard languagesfor intention (FML), behavior (BML) and emotion (Emo-tionML) to make the Visual SceneMaker compatible withalternative components [16, 23]. We also plan to introduce alibrary of reusable behavior patterns for an easy creation ofdifferent behavioral aspects for multiple virtual characters inother interactive performances.

Acknowledgements The presented work was supported by the Ger-man Federal Ministry of Education and Research in the SemProMproject (grand number 01 IA 08002), the INTAKT project (grand num-ber 01 IS 08025B) and it was supported by the European Commis-sion within the 7th Framework Programme in the IRIS project (projectnumber 231824). The authors would like to thank Elisabeth André forsupporting us with their expertise.

References

1. ALICE. Homepage of the alice artificial intelligence foundation.Online: http://www.alicebot.org

2. Bangor A, Kortum P, Miller J (2009) Determining what individualsus scores mean: Adding an adjective rating scale. J Usability Stud4:114–123

3. Beeck Mvd (1994) A comparison of statecharts variants. In: Pro-ceedings of the third international symposium organized jointlywith the working group provably correct systems on formaltechniques in real-time and fault-tolerant systems, London, UK.Springer, Berlin, pp 128–148

4. Brooke J (1996) SUS: A quick and dirty usability scale. In: JordanPW, Weerdmeester B, Thomas A, Mclelland IL (eds) Usabilityevaluation in industry. Taylor and Francis, London

5. Brusk J, Lager T, Wik AHaP (2007) Deal dialogue managementin scxml for believable game characters. In: Proceedings of ACMfuture play, New York, NY, USA. ACM, New York, pp 137–144

6. Endrass B, Wissner M, Mehlmann G, Buehling R, Haering M,André E (2010) Teenage girls as authors for digital storytelling—a practical experience report. In: Workshop on education in inter-active digital storytelling on ICIDS, Edinburgh, UK


http://www.alicebot.org

https://www.researchgate.net/publication/221588560_Realizing_Multimodal_Behavior_-_Closing_the_Gap_between_Behavior_Planning_and_Embodied_Agent_Presentation?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5








https://www.researchgate.net/publication/221588332_Towards_a_Common_Framework_for_Multimodal_Generation_The_Behavior_Markup_Language?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5

https://www.researchgate.net/publication/227120556_The_Behavior_Markup_Language_Recent_Developments_and_Challenges?el=1_x_8&enrichId=rgreq-e5fc3eb6b362ab5c2dc1a2b23ea21c4b-XXX&enrichSource=Y292ZXJQYWdlOzIyOTE2NDAzNjtBUzo5ODcyMjI1MTgwNDY4MkAxNDAwNTQ4NjI3ODY5





7. Engel R (2005) Robust and efficient semantic parsing of freewordorder languages in spoken dialogue systems. In: Interspeech 2005

8. Gebhard P, Kipp M, Klesen M, Rist T (2003) Authoring scenesfor adaptive, interactive performances. In: Rosenschein JS, Wool-dridge M (eds) Proc of the second international joint conferenceon autonomous agents and multi-agent systems. ACM, New York,pp 725–732

9. Gratch J, Rickel J, André E, Cassell J, Petajan E, Badler NI(2002) Creating interactive virtual humans: some assembly re-quired. IEEE Intell Syst 17(4):54–63

10. Gratch J, Marsella S, Wang N, Stankovic B (2009) Assessingthe validity of appraisal-based models of emotion. In: Interna-tional conference on affective computing and intelligent interac-tion, Amsterdam. IEEE, New York

11. Harel D (1987) Statecharts: a visual formalism for complex sys-tems. In: Science of computer programming, vol 8. Elsevier, Am-sterdam, pp 231–274

12. Heidig S, Clarebout G (2011) Do pedagogical agents make adifference to student motivation and learning? Educ Res Rev6(1):27–54. doi:10.1016/j.physletb.2003.10.071

13. Iurgel IA, da Silva RE, Ribeiro PR, Soares AB, dos Santos MF(2009) CREACTOR—an authoring framework for virtual actors.In: Proceedings of the 9th international conference of intelligentvirtual agents. LNAI, vol 5773. Springer, Berlin, pp 562–563

14. Kipp M, Neff M, Kipp KH, Albrecht I (2007) Toward natural ges-ture synthesis: evaluating gesture units in a data-driven approach.In: Proceedings of the 7th international conference on intelligentvirtual agents. LNAI, vol 4722, pp 15–28

15. Kipp M, Heloir A, Gebhard P, Schröder M (2010) Realizing mul-timodal behavior: closing the gap between behavior planning andembodied agent presentation. In: Proceedings of the 10th interna-

tional conference on intelligent virtual agents (IVA-10). Springer,Berlin

16. Kopp S, Krenn B, Marsella S, Marshall AN, Pelachaud C, PirkerH, Thórisson KR, Vilhjálmsson H (2006) Towards a commonframework for multimodal generation: the behavior markup lan-guage. In: Proc of IVA-06

17. Mctear MF (1999) Using the cslu toolkit for practicals in spokendialogue technology. In: University College London, pp 1–7

18. Miksatko J, Kipp KH, Kipp M (2010) The persona zero-effect:evaluating virtual character benefits on a learning task. In: Pro-ceedings of the 10th international conference on intelligent virtualagents (IVA-10). Springer, Berlin

19. Perlin K, Goldberg A (1996) Improv: a system for scripting inter-active actors in virtual worlds. In: Computer graphics proceedings,ACM SIGGRAPH, New York, pp 205–216

20. Prendinger H, Saeyor S, Ishizuk M (2004) Mpml and scream:scripting the bodies and minds of life-like characters. In: Life-likecharacters—tools, affective functions, and applications. Springer,Berlin, pp 213–242

21. Schröder M (2008) Approaches to emotional expressivity in syn-thetic speech. In: Emotions in the human voice, culture and per-ception, vol 3. Pural, San Diego, pp 307–321

22. Spierling U, Mueller SWaW (2006) Towards accessible author-ing tools for interactive storytelling. In: Proceedings of technolo-gies for interactive digital storytelling and entertainment. LNCS.Springer, Heidelberg

23. Vilhjalmsson H, Cantelmo N, Cassell J, Chafai NE, Kipp M, KoppS, Mancini M, Marsella S, Marshall AN, Pelachaud C, RuttkayZ, Thórisson KR, van Welbergen H, van der Werf RJ (2007) Thebehavior markup language: recent developments and challenges.In: Proc of IVA-07


http://dx.doi.org/10.1016/j.physletb.2003.10.071
























































visual scenemaker—a tool for authoring interactive virtual characters

Documents