wip and ppp: a comparison of two multimedia presentation systems in terms of the standard reference...

ELSEVIER Computer Standards & Interfaces I8 (1997) 555-563

WIP and PPP: a comparison of two multimedia presentation systems in terms of the standard reference model

Elisabeth Andr6 * Gemun Research Center-for Artzjicial Intelligence IDFKII Stuhlsatzenhuusweg 3, Sam-b&ken D-66123, Germany

Abstract

In this paper, the Standard Reference Model for IMMPS is used for analysing the architectural design of two multimedia presentation systems which have been built at DFKl over the past 10 years. We first present the systems as described by the authors and then compare them in terms of the reference model. 0 1997 Elsevier Science B.V.

Keywords: Standard reference model; Intelligent multimedia presentation systems; WIP; PPP

1. Introduction

An olbjective of the Standard Reference Model for Intelligent Multimedia Presentation Systems (IM- MPS) (cf. [1], this volume) is to provide a common framework which facilitates the analysis and comparison of this class of systems. In this paper, we will perform such a comparison between two concrete IMMPS which have been developed at DFKI.

The first system, WIP (Knowledge-based Presen- tation of Information), generates multimedia instruc- tions for the maintenance and repair for technical devices (cf. [2,3]). WIP is a highly adaptive interface since all of its output is generated on the fly and customized for the intended target audience and situation. The quest for adaptation is based on the fact that it is impossible to anticipate the needs and requirements of each potential user in an infinite number of presentation situations. Thus all presentation decisions are postponed until runtime.

~- - E-mall: [email protected]

The more advanced system PPP (Personalized Plan-based Presenter) also addresses the temporal coordination of different media as it comes into play when using a life-like character to present multimedia material (cf. [4]). That is PPP creates not only multimedia material, but also designs a script which specifies how this material should be presented to the user.

In this paper, both systems are briefly described including an outline of the underlying generation approach using the author’s original terminology. In the last part of the paper, both systems are re- described and compared using the Reference Model for IMMPS.

2. The knowledge-based presentation system WIP

The major design goals of WIP are the generation of coordinated presentations from a common representation, the adaptation of these presentations to the intended target audience and situation and the incre-

0920.5489/97/$17.000 1997 Elsevier Science B.V. All rights reserved. PII SO920-5489(97)00019-6

556 E. Andre’/ Computer Standurds & Inteflahces 18 (19971 555-563

mentality of all processes constituting the design and realization of the multimedia output.

2. I. Scope of the WIP system

The design and the generation of a presentation in WIP is goal-driven and controlled by a set of generation parameters, such as document type, target group, resource limitations, and target language. In order to illustrate the effect of the generation parameters on the presentation, let us have a look at a snapshot of a systern run (see Fig. 1). In this example, the system is requested to instruct the user in preparing a modem for data reception. On the right-hand side, a formal specification of this goal is shown. fBMB S U (Plan prepare-modem-l ?p)) means that both the system and the user should know a plan for preparing the modem for data reception.

Generation parameters are set via the pop up menu. The user has indicated that he is an English speaker (Target Language: English) and not familiar with the technical domain (User Category: Novice). Furthermore, he informs the system that he is inter-

ested in a short presentation printed in the style of an instruction manual by setting the parameter ‘Space Restricted’ to ‘ yes’, ‘Speech Output’ to ‘No’, and ‘Document Type’ to ‘Instruction Manual’. As the option incremental output mode is chosen, the system begins typing out text fragments and graphical elements as soon as they are generated. Specific medium preferences are not made (Preferred Medium: ‘None’).

The generated presentation is shown above the parameter menu. In contrast to a document retrieval system, all parts of the presentation are generated from scratch. For example, in order to make the code switch visible, WIP decided to show the top cover of the modem in an exploded view style. This has been achieved by manipulating the 3D wireframe model of the modem before the projection was done.

Starting the system again with the same presentation goals, but different parameter settings leads to major changes in the presentation. For instance, as- suming that the user is a modem expert who knows where the single modem components are located, there is no need to show object locations in a picture.

PP: Gcnemtcd dkrlpti.

“: (THE at-OFF-WITct+-1 ~0II-M-SY11cN ON-OFF-SYxrot-1))

spat* R~trlCtBd: vms No Spwih Output: Vmr No

output uodw tnct*m*ntat Bell

SET PARAMETERS

Fig. 1. Snapshot of a system run.

E. And&/ Computer Standards & Interfaces 18 (19971555-563 551

Target Language: English User Category: Expert

Target Language: German ! User Category: Expert I

Set the modem for reception of data. Connect the telephone cable.

1 Schalten Sie das Modem auf Empfang.

Turn on the modem. SchlieBen Sie das Telephonkabel an. Schalten Sie das Modem ein.

Fig. 2. Influence of generation parameters ‘Target Group’ and ‘Target Language’

The left-hand side of Fig. 2 shows the generation result after the presentation parameter ‘User Cate- gory’ has been set to ‘Expert’. Note that not only the picture h.as been omitted, but that the text produced has a higher degree of abstraction than in the first example. In a further system run, we demonstrate the influence of the generation parameter ‘Target Lan- guage’ (Isee the right-hand side of Fig. 2).

2.2. The generation approach

A basic assumption behind the WIP model is that not only the generation of text and dialogue contribu- tions, but also the design and presentation of graphics and multimodal documents are planning tasks (cf. [5]). As input, WIP receives a presentation goal. This goal is forwarded to the presentation planner which

tries to find a presentation strategy which matches this goal and generates a refinement-style plan in the form of a directed acyclic graph (DAG). This DAG reflects the propositional contents of the potential document parts, the intentional goals behind the parts as well as the rhetorical relationships between them. While the top of the presentation plan is a more or less complex presentation goal (e.g., instructing the user in switching on a device), the lowest level is formed by specifications of elementary presentation tasks (e.g., formulating a request or depicting an object). These elementary tasks are directly forwarded to the text and graphics generators (cf. Fig. 3).

Each generator consists of an incremental design and realization component which form a cascade. Thus, the basic modularization is the same both for

Text Design

Graphics Design

Text Realization

Graphics Realization

Fig. 3. Architecture of the WIP system.

558 E. Andri/ Computer Standards & Interjhces 18 (1997) 555-563

text and graphics generation, resulting in two parallel cascades. The presentation planner and the media- specific generators interact incrementally in a pipelined mode. In other words, text and graphics design and even the verbalization and visualization can start, before the presentation plan is completed. The text and graphics design components can be seen als micro-planners of the what-to-say and what- to-show parts of the media-specific generators. For example, lexical choice is not carried out by the presentation planner on the macro-plan level, but by the text design component.

dow-based interface. That is PPP has to design multimedia material, and to plan presentation acts and their temporal coordination.

There is no direct communication from a media- specific realization module back to the presentation planner or layout manager, but all such communication is mediated by the corresponding design module. As soon as the presentation planner and the layout manager have made enough commitments to allow the media-specific generators to start work, the text and/or graphics design components are acti- vated. Then the control passes back and forth between the modules of the cascade, interleaving their execution.

To demonstrate the performance of PPP, lets con- sider again the task of explaining how to operate a modem. While WIP only generates static presentations, PPP produces an audio visual presentation given by the interface agent (see Fig. 4). As in WIP, the system first creates a window showing the modem’s circuit board. After the window has appeared on the screen, the PPP Persona takes up a suitable position for telling the user what he has to do. It first verbally informs the user that he should push the code switch S-4 to the right.

3. The personalized presenter PPP persona

One limitation of the WIP system is that it merely generates the material to be presented such as text picture combinations or animation sequences. It does not plan when and how to present this material to a particular user. Experience shows that a multimedia presentation might fail despite of the high quality of the material to be presented. This can often be observed in cases where multimedia output is dis- tributted over several windows thus requiring the user to find out herself how to navigate through the presentation.

As in WIP, it assumes that the user is not able to localize the switch. However, while WIP introduces objects by drawing text labels and arrows onto graphics (see Fig. l), the PPP Persona enables the realization of dynamic annotation forms as well. In the example, it points to the code switch and utters ‘This is the code switch S-4.’ (using a speech syn- thesizer). One advantage of this method over static annotations is that the system can influence the temporal order in which the user processes a graphical depiction. It is even possible to combine both methods since the PPP Persona can also place textual labels on the illustrations before the user’s eyes. After that, the Persona describes the remaining actions to be carried out (not shown in the illustration).

3.2. Generation approach

Basically, PPP relies on the WIP approach for presentation planning (cf. [6]). However, in order to enable both the creation of multimedia objects and the generation of scripts for presenting the material to the user, the following two extensions have become necessary.

To enhance the effectivity of computer-based communication, we embarked on a new project, called PPP (Personalized Plan-Based Presenter).

s The distinction between production and presentation acts

3. I. Scope of the PPP system

Whereas production acts refer to the creation of material, presentation acts are display acts, such as S-Display-Text, or acts which are carried out by the PPP Persona, e.g., S-Point.

In contrast to WIP, the PPP system employs a , The specification of qualitative and quantitative life-like character, the so-called PPP Persona, which temporal constraints in the presentation strategies acts as a presenter, showing, explaining, and verbally Qualitative constraints are represented in an ‘Al- commenting textual and graphical output on a win- len-style’ fashion which allows for the specification

E. And&/ Computer Standurds & Interfaces 18 (19971 555-563 559

Fig. 4. Dynamic annotatmn

of thirteen temporal relationships between two named intervals, e.g., (Speak1 (During) Point2). Quantita- tive conlitraints appear as metric (in)equalities, e.g., (5 < Duration Point2).

- The der>elopment qf a mechanism for building up and refining presentation schedules

To temporally coordinate presentation acts, the presentation planner has been combined with a temporal reasoner which is based on MATS (Metric/Al- len Time System, cf. [7]). During the presentation planning process, PPP determines the transitive clo- sure over all qualitative constraints and computes numeric ranges over interval endpoints and their difference. After that, a presentation script is built up by resolving all disjunctions and computing a total temporal order. Since the temporal behavior of presentation acts may be unpredictable at design time, the script will be refined at runtime by adding new metric constraints to the constraint network.

As in WIP, the presentation planner decomposes a complex presentation goal into elementary acts. However, PPP distinguishes between elementary production and presentation acts which have to be handled in a different way. Elementary production acts are sent to the corresponding generators which, in turn, inform the presentation planner when they have accomplished their task and how they have encoded a certain piece of information. The results of the generators are taken as input for the design of the presentation script which is forwarded to the display components for execution (see Fig. 5).

The task of the layout manager is the determina- tion of effective screen layouts and the maintenance of user interactions. The Persona Server (cf. [8]) carries out the Persona actions which, among other things, includes assembling appropriate animation sequences. Furthermore, it augments the presentation by believability enhancing behaviors, such as idle-

560 E. Andre’/ Computer Standards & Interfaces I8 (1997) 555-563

Fig. 5. Architecture of the PPP system.

time acts. Both display components signal when they have accomplished their tasks and inform the presentation planner about the occurrence of interaction events, such as mouse-clicks on windows or the Persona.

The realization of the Persona Server follows the client/server paradigm; i.e., client applications can send requests for the execution of presentation tasks to the server (cf. [S]). Since it depends on the Persona’s current state as to whether or not a request can be immediately handled, they are buffered in an input queue (cf. Fig. 6). In return, confirmation is sent back to the application after a task has been

performed. Application clients within the PPP system are the PPP presentation planner and the layout manager. The platform interface bridges to the underlying window system, and to several other external devices such as speech generators (for different languages) and an audio player. Vice versa, interaction events recognized by the window system and return values of the external devices are received by the platform interface.

The inner components of the server are a behavior monitor, an event handler and a character composer. The task of the event handler is to recognize whether input derived from the platform interface

Fig. 6. Architecture of the Persona Server.

E. Andre’/ Computer Standards & Interfaces 18 (1997) 555-563 561

needs immediate responses from the Persona. That is, the event handler checks for each input message whether the message triggers one of the so-called ‘reactive behaviors’ stored in an internal knowledge-base. If this is the case, the selected behavior is made accessible to the behavior monitor. The task. of the behavior monitor is to decide which action to execute next. For instance, if the Persona has no other tasks to perform, it will run an idle-time script that is selected from an internal knowledge- base. The character composer is responsible for run- ning the Persona animations. For each posture and action, it selects frames (video frames or drawn images) from an indexed data-base and forwards the display commands to the window system.

4. Comparison of WIP and PPP using the reference model

In the following, we will compare WIP and PPP in terms of the reference model. That is, the components of both systems are mapped onto the layers of the reference architecture. Furthermore, we indicate how the expert modules of the reference architecture are instantiated by WIP and PPP.

4. I. Goal formulation

Both in WIP and PPP, Goal Formulation can be done by the use via a menu-interface. This interface allows the user to modify the generation parameters to choose a goal to be achieved. In addition, PPP offers the user a simple hypermedia-style interface which allows the user to select mouse-sensitive parts of a graphics or a text with the mouse. On the basis of these mouse-clicks, the system generates a menu of follow-up questions which may be asked in the current context. That is in PPP, a user’s goal may also be determined as a function of a selected ques- tion. In both systems, a goal is expressed as a mental state which the presentation viewer is to come about.

4.2. Control layer

Both in WIP and PPP, the selection of the next goals to be accomplished is done by a subcomponent of the presentation planner. Since WIP only gener-

ates non-interactive presentations, no recovery strategy is provided by the control layer when the user interrupts a presentation. Unlike WIP, PPP also allows for user interaction. Therefore, the control layer in PPP is realized by two components: the event handler that decides how to react to external events and the presentation planner that decides which goal should be accomplished next.

4.3, Content layer

The Content Layer is responsible for high-level authoring tasks, such as selecting appropriate contents, content structuring and media allocation. In both systems, these tasks are handled by the presentation planner. Thus, WIP’s and PPP’s presentation planner actually span over two layers of the generic reference architecture. An integral part of PPP’s presentations are believability-enhancing behaviors that are determined by the behavior monitor. Consequently, the content layer in PPP is realized by two components while in WIP all tasks of the content layer are handled by the presentation planner.

4.4. Design layer

The design layer embodies a number of media- specific design components. In WIP, there are design components for text and graphics, in PPP, additional components for animation, gestures and music have been integrated. Furthermore, both systems rely on a layout design component which is responsible for setting up layout constraints. While WIP only han- dles constraints for the spatial layout, PPP also specifies the temporal behavior of a presentation by means of qualitative and metric temporal constraints. The spatial layout is done by a separate component while the temporal layout is determined by the presentation planner. Thus, PPP’s presentation planner accom- plishes tasks of three different layers. Furthermore, the sequencing of Persona animations which is done by the behavior monitor may be considered as a design task and thus be assigned to the design layer.

4.5. Realization layer

The task of the realization layer is the media- specific encoding of information according to the

562 E. And&/ Computer Standards & Inrefaces 18 (19971555-563

design specifications which have been worked out in the superordinate Design Layer. In WIP, there is a graphics realization component for rendering 3D- and 2D-graphics and a text realization component where grammatical encoding, linearization and in- flection takes place. PPP contains additional realization components for animation, gestures and music. Both systems rely on a layout realization component to determine the spatial arrangement of the output. Unlike WIP, PPP also has a temporal layout realization which designs a schedule considering the constraints delivered by the design layer. Finally, PPP’s character composer may be assigned to the realization layer.

4.6. Presentation display layer

The Presentation Display Layer describes the runtime environment for a presentation. In WIP, it comprises the Xl l-window manager and an interface to a postscript printer, in PPP the Xl l-window manager, a Java-enhanced WWW-browser and an audio player.

4.7. User expert

The WIP and PPP stereotype user models distin- guish between novice and expert users. The goals,

preferences and knowledge of each user stereotype are stored in different knowledge bases. If a new user interacts with the system, the user model associ- ated with the corresponding stereotype is copied and updated after a goal has been achieved. From that time on, the user is supposed to know the information conveyed by realizing the goal.

4.8. Application and application expert

The WIP’s and PPP’s application knowledge is partly codified as propositions in a terminological logic and partly as geometric wire-frames for the 3D-graphics generation. The propositionally represented knowledge is used both for the generation of text and graphics, as the main source of knowledge about the domain.

4.9. Context expert

The WIP’s and PPP’s context knowledge consists of a document design plan which reflects the struc- ture of the document. Furthermore, the system explicitly maintains an explicit representation of the syntax and semantics of a presentation by means of encoding relations. In PPP, the context knowledge also comprises information concerning the temporal behavior of presentation.

Laver I WIP I PPP

-

-

Control Layer

Subcomponent of the presentation planner

Event handler and subcomponents of the presentation planner

Content Layer

Subcomponent of the presentation planner

Subcomponents of the presentation planner and the behavior monitor

Design Layer

Design components for text, graphics, and spatial layout

Design components for text, graphics, gestures, animation, music and spatial layout, subcomponents of the presentation planner and the behavior monitor

Realization components Realization components

Realization for text, graphics, and for text,, graphics, gestures,,

Layer spatial layout animation, music and spatial layout, scheduler and character composer

Presentation Window manager, interface Window manager, netscape layer to a postscript printer browser, audio player

Fig. 7. Instantiation of the layers in WIP and PPP.

E. And&/ Computer Standards & Inter&es 18 (I 997) 555-563 563

4. IO. Design expert

The ‘WIP’s and PPP’s design expert comprises declaratively coded design strategies, a lexicalized Tree Adjoining Grammar for text generation and lexica for different target languages. In addition, PPP maintains several indexed data bases with video nonframes and cartoons and predefined scripts.

Fig. ‘7 summarizes how the single layers of the reference model have been instantiated in WIP and PPP.

5. Conclusion

In this paper, we have compared two existing multimedia presentation systems by redescribing them in terms of the standard reference model. Our analysis has shown that WIP’s architecture bears strong structural similarities with the modularization into layers as proposed in the reference model. Con- sequently, WIP could be described in a straightfor- ward manner using the model. This was not the case for the more complicated PPP system. We noticed this when describing PPP’s Persona Server since this compomnt performs tasks which are not explicitly addressed by the reference model. As a reason, we assume that the development of the reference model was targeted towards presentation systems like WIP which focus on the production of multimedia material. However, since animated agents like the PPP Persona are likely to become integral parts of future interfaces, an extension of the the reference model to capture this class of presentation systems too should be envisioned.

Acknowledgements

This work has been supported by the German Federal Ministry of Education, Science, Research and Technology (BMBF) under the contracts ITW 8901 8 and ITW 9400 7. I would like to thank

Thomas Rist for his comments on an earlier draft of this paper.

References

[I] M. Bordegoni, G. Faconti, S. Feiner, M. Maybury, T. Rist. S. Ruggieri. P. Trahanias, M. Wilson. A standard reference model for intelligent multimedia presentation systems, Com- puter Standards and Interfaces 18 (6.7) (I 997) 477-496.

121

131

141

151

[61

171

if31

E. And& W. Finkler, W. Graf, T. Rist, A. Schauder, W. Wahlster. WIP: the automatic synthesis of multimodal presentations, in: M. Maybury (Ed.), Intelligent Multimedia Inter- faces. AAAI Press, 1993, pp. 75-93. W. Wahlster, E. And& W. Finkler. H.-J. Profitlich, T. Rist. Plan-based integration of natural language and graphics generation, Al J. 63 (1993) 387-427. E. And&, J. Miller. T. Rist, The PPP persona: a multipurpose animated presentation agent, in: Advanced Visual Interfaces, ACM Press, 1996. pp. 245-247. E. Andri, T. Rist, The design of illustrated documents as a planning task. in: M. Maybury (Ed.). Intelligent Multimedia Interfaces, AAAI Press, 1993, pp. 94-I 16. E. AndrC, T. Rist, Coping with temporal constraints in multimedia presentation planning, in: Proc. of AAAI-96, Vol. I, Portland, OR, 1996, pp. 142- 147. H.A. Kautz, P.B. Ladkin, Integrating metric and qualitative temporal reasoning, in: Proc. of AAAI-9 I, 199 I, pp. 24 l-246. T. Rist, E. Andrt, J. Miiller, Adding animated presentation agents to the interface, in: Proceedings of the 1997 Intema- tional Conference on Intelligent User Interfaces, Orlando, FL, 1997, pp. 79-86.

Dr. rer nat. Elisabeth Andre is a project leader at the German Research Center for Artificial Intelligence (DFKI). She has been actively involved in several industrial and academic projects including the WIP (Knowledge-Based Presen- tation of Information) project which was honored as an ITEA winner (Informa- tion Technology European Awards) in November 1995. Since February 1997. she has been holding the Chair of the ACL Special Interest Group on Multi- media Language Processing

(SIGMEDIA). Dr. Andrb has published more than 70 technical papers on language technology and intelligent user interfaces. She is on the editorial board of AI Communications and the area editor for Intelligent User Interfaces of Electronic Transactions on Artifi- cial Intelligence (ETAI). Furthermore, she is editing a special issue on Animated Interface Agents of the Applied Artificial Intelligence Journal. Her current research interests include: multimedia authoring, intelligent user interfaces, natural-language processing and life-like characters.

wip and ppp: a comparison of two multimedia presentation systems in terms of the standard reference...

Documents