the interplay of user-centered dialog systems and ai planning · user-friendly search strategies....

The Interplay of User-Centered Dialog Systems and AI Planning

Florian Nothdurft?, Gregor Behnke†, Pascal Bercher†,Susanne Biundo† and Wolfgang Minker?

?Institute of Communications Engineering, †Institute of Artificial IntelligenceUlm University, Ulm, Germany

florian.nothdurft,gregor.behnke,pascal.bercher,susanne.biundo,[email protected]

Abstract

Technical systems evolve from simplededicated task solvers to cooperative andcompetent assistants, helping the userwith increasingly complex and demand-ing tasks. For this, they may proactivelytake over some of the users responsibilitiesand help to find or reach a solution for theuser’s task at hand, using e.g., Artificial In-telligence (AI) Planning techniques. How-ever, this intertwining of user-centered di-alog and AI planning systems, often calledmixed-initiative planning (MIP), does notonly facilitate more intelligent and com-petent systems, but does also raise newquestions related to the alignment of AIand human problem solving. In this pa-per, we describe our approach on integrat-ing AI Planning techniques into a dialogsystem, explain reasons and effects of aris-ing problems, and provide at the same timeour solutions resulting in a coherent, user-friendly and efficient mixed-initiative sys-tem. Finally, we evaluate our MIP systemand provide remarks on the use of expla-nations in MIP-related phenomena.

1 Introduction

Future intelligent assistance systems need to inte-grate cognitive capabilities to adequately supporta user in a task at hand. Adding cognitive capa-bilities to a dialog system (DS) enables the user tosolve increasingly complex and demanding tasks,as solving complex combinatorial problems can bedelegated to the machine. Further, the user may beassisted in finding a solution in a more structuredway. In this work we focus on the cognitive capa-bility of problem solving via help of AI Planningtechnology in the form of Hierarchical Task Net-work (HTN) planning (Erol et al., 1994; Geier and

Bercher, 2011). It resembles the human top-downway to solve problems. Such planners can helpusers to find courses of action, i.e., a sequence ofactions, which achieve a given goal. In HTN plan-ning the user states the goal in terms of a set of ab-stract actions, e.g., train(abs), which are repeat-edly refined into more concrete courses of action– using so-called methods – during the planningprocess. For example, train(abs) could be refinedinto a crunches(20) and a sit-up(50) action. A so-lution is found if the the plan only contains prim-itive actions, i.e., actions which cannot be refinedfurther, and the plan itself is executable.

In Section 2, we explain in more detail the ad-vantages of integrating AI planning capabilitiesinto a DS. Such an integration poses significantchallenges, however. Most importantly, the wayplanners search for solution plans is very differ-ent from the way humans do, as their concern ismainly efficiency. In Section 3 we hence showhow a planner can be adapted to better suit hu-man needs. Further, we describe which kinds ofplanning-specific phenomena can not be avoidedand thus the dialog manager must be able to han-dle. In Section 4 we describe the architectureof our intertwined system, followed by some re-marks on the implementation in Section 5. Withinthis section we also discuss why a common sourceof knowledge for both the planner and the dialogmanager is needed and how it can be created. Sec-tion 6 contains an evaluation of the implementedsystem in a fitness-training scenario.

2 Why Integrating a Planner?

In classical use-case scenarios for HTN plan-ners (Nau et al., 2005; Biundo et al., 2011) a planis generated without any user involvement, besidesthe specification of the goal, and afterwards pre-sented to him. Hence, the planner is a black-boxto the user, which is often not adequate. If execut-ing the plan involves grave risks, e.g., in military

settings (Myers et al., 2002) or spaceflight (Ai-Chang et al., 2004), humans must have the finaldecision on which actions are to be contained inthe plan. Planning systems can also be utilized tocreate plans for personal tasks like fitness train-ing, cooking, or preparing a party. Here, it isexpected that created plans are highly individual-ized, i.e., that they not only achieve given goals butalso respect the user’s wishes about the final plan.One might argue that such individualization couldbe achieved by integrating preferences or actioncosts into planning (Sohrabi et al., 2009). How-ever, this approach requires that the user can spec-ify his preferences completely and a priori and thatthey must be expressible, e.g., in terms of action ormethod costs or LTL formulae. Even if the user’spreferences were expressible, it would be requiredto question the user extensively prior to the ac-tual interaction, which is very likely to result inthe user aborting the interaction.

Instead the dialog manager and especially theplanner should learn about the user’s preferencesduring interaction, fostering an understanding ofthe user’s preferences. This requires the integra-tion of the user into the planning process, result-ing in a so-called mixed-initiative planning (MIP)system. A few approaches to creating such sys-tems have already been investigated (Myers et al.,2003; Ai-Chang et al., 2004; Fernandez-Olivareset al., 2006). Unfortunately, they cannot support acomplete dialog with the user, as they mostly re-act on inquiries and demands from the user andonly present the user with completed plans, i.e.,plans that have already been refined into a solu-tion. We deem such schemes impractical, as theyrequire the user to comprehend a (potentially com-plicated) solution at once, making it hard to ex-press opinions or wishes about it. For us, it wouldbe more natural to iteratively integrate the userduring the plan generation, making it on the onehand easier for the user to comprehend the planand options to refine it, and on the other hand re-ducing the cognitive load, as the user does not haveto understand the complete plan at once.

MIP can be interpreted as the system-initiatedintegration of the user in the planning process, butfrom a user’s perspective it is the attempt to solveproblems by using promised competencies of atechnical system. For the user dedicating planningand decision-making to a technical system is donewith the intent of finding a solution the user is not

able to find at all or only with great effort. It aimsat relieving the user’s cognitive load and simpli-fying the problem at hand. Hence, the iterativeintegration of the user seems to be not only morenatural, but also more practical for the user.

3 Challenges of MIP

In this section, we describe arising challenges ofMIP. We discuss the differences between state-of-the-art AI Planning and the way humans solveproblems, as they raise issues for a successful inte-gration of a planner into a DS. To achieve it never-theless, we show how a planner can be modified toaccommodate them and which issues must be ad-dressed by an advanced dialog management (DM).

How to Integrate the Planner. The integrationof AI Planning begins with the statement of a plan-ner objective in a dialog. This requires on theone hand a user-friendly and efficient objective-selection dialog, and on the other hand the creationof a valid planning problem. Thus, the semanticsof the dialog has to be coherent to the planningdomain, resulting in a valid planning problem.

User-friendly Search Strategies. Almost all AIPlanning systems use efficient search strategies,like A* or greedy, to find a solution for a givenplanning problem. The order with which plansare visited is based upon a heuristic estimating thenumber of modifications needed to refine the givenplan into a solution. Plans with smaller heuris-tic value are hence regarded more promising andvisited earlier. In A* search, as well as in anyother heuristic-based search approach, it may hap-pen that after one plan was explored, the next oneexplored will be any plan within the search space– not just one that is a direct successor of the planexplored last. As such, these strategies may resultin the perception that the user’s decisions only ar-bitrarily influence the planning process, which inturn may result in an experience of lack of controland transparency.

In contrast, humans tend to search for a plan byrepeatedly refining the last one. A search strategythat resembles that strategy is depth-first search(DFS). Here, always the plan explored last is re-fined until a solution is found or the current planis proved unsolvable, i.e., it cannot possibly be re-fined to a solution. In that case, the last refinementis reverted and another possible refinement optionis chosen. If none exists the process is repeated

recursively. A major drawback of DFS is that itdoes not consider a heuristic to select plans whichare more promising, i.e., closer to a solution. DFSis blind, leading to a non-efficient search and non-optimal plans, i.e., the final plan may contain un-necessary actions. This problem can be addressedif the planner prefers refinements of plans withlower heuristic value if the user is indifferent aboutthem. This scheme enables the interplay of userdecisions and the planner’s ability to solve com-plex combinatorial problems. The user controlsthe search until such a problem arises by selectingpreferred refinements. Then he may signal the DSthat he does not care about the remaining refine-ment, resulting in the planner using its heuristic tofind a valid solution.

Handling of Failures During Planning. A ma-jor difference between human problem solvingand DFS is the way failures are handled. In plan-ning, a failure occurs if the current plan is provedunsolvable (e.g., by using well-informed heuris-tics). As mentioned earlier, DFS uses backtrack-ing to systematically explore all remaining optionsfor decisions that lead to a failure, until a suc-cessful option has been found. Practical heuristicsare necessarily imperfect, i.e., they cannot deter-mine for every plan whether it is unsolvable or itmay be refined into a solution. Hence, even whenusing a well-informed heuristic, the planner willpresent the user with options for refining a planthat will inevitably cause backtracking. DFS back-tracking is a very tedious and frustrating process,especially if the faulty decision was made early onbut is found much later. Large parts of the searchspace, i.e., all plans that can be refined based onthe faulty decision have to be explored manuallyuntil the actually faulty decision is reverted. Thismay result in the user deeming the system’s strat-egy naive and the system itself incompetent. Thisis important, since the use of automation correlateshighly to a user’s trust into an automated system,which in turn depends mainly on the perception ofits competence (Muir and Moray, 1996). To pre-vent the user, at least partially, from perceiving thesystem as incompetent, we can utilize the comput-ing power of the planner to determine whether anoption for refinement only leads to faulty plans.We call such refinements dead-ends. If options arepresented to the user, the planner starts exploringthe search space induced by each refinement us-ing an efficient search procedure. If he determines

that such a search space cannot contain a solution,the respective refinement is a dead-end. It is theobjective of the dialog manager to appropriatelyconvey this information to the user, especially ifcomputing this information took noticeable time,i.e., the user has already considered the presentedoptions and one has to be removed.

How to Integrate the User. Another importantfactor for a successful human-computer interac-tion is the question when a user should be involvedinto the planning process and if, how to do it.Clearly, the planner should not be responsible forthis kind of decisions as it lacks necessary capa-bilities, but may contribute information for it, e.g.,by determining how critical the current decision iswith respect to the overall plan. From the plan-ner’s view every choice is delegated to the uservia the DM, achieving maximal flexibility for themanager. The dialog manager on the other handcan either perform an interaction with the user, ordetermine by itself that the choice should be madeby the planner, which is equivalent with the usersignaling “Don’t care”. It should be consideredwhether interaction is critical and required to suc-cessfully continue the dialog or to achieve short-term goals, but risks the user’s cooperativeness forinteraction in the long run, e.g., by overstraininghis cognitive capabilities or boring him. If theuser is to be involved, the question arises how thisshould be rendered, i.e., what kind of integration isthe most beneficial. Additionally, if he is not, thedialog manager must decide whether and if howhe may be informed of the decisions the plannerhas made for him.

4 Concept and Design

The integration of AI Planning and user-centereddialog begins with the statement of an objective.This first dialog between user and machine has thegoal of defining the task in a way understandablefor the planner. Once the problem is passed to theplanner the interactive planning itself may start.Using the described depth-first search the planis refined by selecting appropriate modificationsfor open decisions. In order to decide whetherto involve the user or not during this process, anelaborate decision model, integrating various in-formation sources, is required. Relevant informa-tion sources are, e.g., the dialog history (e.g., wasthe user’s decision the same for all past similarepisodes?), the kind of plan flaw (e.g., is this flaw

relevant for the user?), the user profile (e.g., doesthe user have the competencies for this decision?),or the current situation (e.g., is the current cogni-tive load of the user low enough for interaction?).Those examples of relevant information sources il-lustrate that a decision model can not be locatedeither in the DM or the planner, but in a superor-dinate component, the so-called Decision Model.

In case of user involvement the information onthe current plan decision has to be communicatedto the user. This means that the open decisionand the corresponding choice between availablemodifications have to be represented in a dialogsuitable for the user. Hence, the correspondingplan information needs to be mapped to human-understandable dialog information. As this map-ping is potentially required for every plan informa-tion and vice versa for every dialog information,coherent models between planner and DS becomecrucial for MIP systems. The thorough matchingof both models would be an intricate and strenu-ous process, requiring constant maintenance, es-pecially when models need to be updated. Thus,a more appropriate approach seems to be the au-tomatic generation of the respective models usingone mutual model as source, the Mutual Knowl-edge Model. This way, once the transformationfunctions work correctly, coherence is not an is-sue any more, even for updating the domain. Howthese essential constituents of a conceptual MIPsystem architecture (depicted in Figure 1) wereimplemented in our system, will be explained inthe next Section.

PlanningFramework

Decision ModelInteractive Heuristic, Domain Heuristic, User Presentation

Heuristic, Interaction History, User Profile

DialogueManagement Interfaces

Sensors

User

Environment

Knowledge ModelMutual Domain

Figure 1: Essential components of a MIP system.

5 Implementation

We implemented and tested a multimodal MIPsystem using a knowledge-based cognitive archi-tecture (Bercher et al., 2014). The multimodalinterface uses speech and graphical user input aswell as output. The Dialog Management usesa modality-independent representation, communi-cating with the user via the Fission (Honold et al.,2012), User Interface (Honold et al., 2013) , and

Fusion (Schussel et al., 2013) modules. Here, wewill describe in more detail the two components,which are of particular interest for MIP systems:The Mutual Knowledge Model and the DecisionModel.

5.1 Mutual Knowledge Model

This model, from which the planning and dialogdomain models are automatically generated usingautomated reasoning, is crucial for a coherent MIPsystem. It is implemented in the form of an OWLontology (W3C OWL Working Group, 2009). Bygenerating the HTN planning domains from theontology, a common vocabulary is ensured – forevery planning task a corresponding concept existsin the ontology. Hierarchical structures (i.e., de-composition methods) inherent of HTN planningare derived using declarative background knowl-edge modeled in the ontology. For the delica-cies of modeling planning domains in an ontology(e.g., how to model ordering or preconditions), in-cluding proofs for the transformation and remarkson the complexity of this problem, see Behnke etal. (2015) for further details.

The model is also utilized to infer a basicdialog structure, which is needed for the userto specify the objective for the planner. Us-ing a mutual model addresses one of the chal-lenges of MIP, since translation problems be-tween dialog and planner semantics can be pre-vented. For the dialog domain generation amapping between ontology concepts and dialogsis used. The dialog hierarchy can be derivedusing ontology knowledge. A dialog A canbe decomposed into a sequence of sub dialogscontaining the dialogs B1, . . . , Bn by an axiomClass:A EquivalentTo: includes onlysome

[B1, . . . , Bn], which is interpreted by the dialog asa corresponding decomposition method. For ex-ample, a strength training can be conducted usinga set of workouts A1, . . . , An, each of which con-sists of a set of exercises B1, . . . , Bn. This waya dialog hierarchy can be created, using the top-most elements as entry points for the dialog be-tween user and machine. Nevertheless, this re-sults only in a valid dialog structure, but not in amost suitable one for the individual user. For this,concepts of the ontology can be excluded fromthe domain generation or conjugated to other el-ements in a XML configuration file. This wayunimportant elements can be hidden or rearranged

for the user. The dialogs are also relevant duringthe MIP process. When selecting between severalPlan Modifications, these have to be translated toa format understandable by the user. Hence, inaddition to the knowledge used to generate plansteps, resources are required for communicatingthese steps to the user. Therefore, texts, pictures,or videos are needed, which can be easily refer-enced from an ontology. Using this information,dialogs suitable for a well-understandable human-computer interaction can be created and presentedto the user.

One key aspect of state-of-the-art DS is the abil-ity to individualize the ongoing dialog accordingto the user’s needs, requirements, preferences, orhistory of interaction. Coupling the generation ofthe dialog domain to the ontology enables us toaccomplish these requirements using ontologicalreasoning and explanation in various ways. Thedialogs can be pruned using ontological reason-ing according to the user’s needs (e.g., “show onlyexercises which do not require gym access”), tothe user’s requirements (e.g., “show only beginnerexercises”) or adapted to the user’s dialog history(e.g., “preselect exercises which were used the lasttime”) and preferences (e.g., “present only exer-cises with dumbbells”). Additionally, integratingpro-active as well as requested explanations intothe interaction is an important part of impartingused domain knowledge and clarifying system be-havior. Using a coherent knowledge source tocreate dialog and planning domains enables us touse predefined declarative explanations (Nothdurftet al., 2014) together with dynamically generatedplan explanation (Seegebarth et al., 2012) and ex-planations for ontological inferences (Schiller andGlimm, 2013) without dealing with inconsistencyissues. This way Plan Steps (e.g., exercises) canbe explained in detail, dependencies between plansteps can be explained to exemplify the necessityof tasks (i.e., plan explanation), and ontology ex-planations can justify decompositions from whichthe planning model and the dialog domain wheregenerated. All of which increase the user’s per-ceived system transparency.

5.2 Decision Model

This model is in charge of deciding when and howto involve the user in the planning process. It is theinterface to the planner and decides, upon plan-ner requests, whether a user involvement is useful.For this it includes a list of essential domain deci-

sions that are interesting and relevant for the user(e.g., for a training domain: day, workout, and ex-ercises) - the rest is left for the fallback-heuristic,and thus decided by the planner. Hence, the useris only involved in the decision making if a user-relevant planning decision is pending (e.g., “whichleg exercise do you prefer?”). If it is in favor ofuser involvement the open decision and its modi-fications have to be passed to the user. Hence, thedecision on the form of user integration has to bemade. The dialog may either consist of the com-plete set of modifications, a pruned or sorted list,implicit or explicit confirmations of system-madepreselections, or only of a user information. Thisdecision depends not only on the interaction his-tory, but also on additional information (e.g., af-fective user states like overextension, interest, orengagement) stored in the user state.

The Decision Model also records the dialog-and planning history. There are several reasons forthat: The dialog history may enable a prediction offuture user behavior (e.g., in selections), and ad-ditionally this knowledge is mandatory for back-tracking processes, when the current plan does notlead to a solution. The history saves which deci-sions were made by the user. In case of backtrack-ing the decisions are undone step-by-step, with thegoal of finding a solution by applying alternativemodifications. Whenever a user-made decision isundone, the user is notified, because this systembehavior would otherwise appear irritating.

Since backtracking as well as dead-ends are pe-culiar phenomena in a MIP system, the communi-cation of these might be a critical influence on theuser experience. Together with the DM, the Deci-sion Model orchestrates the corresponding systembehavior. The main difference between backtrack-ing and dead-ends is the temporal ordering of theawareness of the unsolvable plan and made deci-sion. For backtracking the awareness is achievedafter the decision, and for dead-ends during thedecision. As we assumed that backtracking willimpair the user experience significantly, a parallelsearch for dead-ends, as described in Section 3,was implemented. The process itself is, of course,inherently different from backtracking, but mayprevent it. Removing dead-ends from the searchspace, when the relevant modification is not partof the current selection, is a rather easy task. Oth-erwise, the current selection has to be modified toprevent the user from selecting a dead-end. How-

ever, removing it without any notification from thelist seems like a confusing behavior. As we hadonly hypotheses on the effects of these peculiarevents as well as on the effects of the differentforms of integrating the user into the planning pro-cess, we conducted a matching user study.

6 Evaluation

MIP may lead to the user experiencing planning-related phenomena, such as backtracking or dead-ends. These phenomena revoke decisions madeby the user or alter the present decision-makingand therefore may influence the user’s experienceof the system. As mentioned before, this may im-pair the perceived competency of the system, lead-ing to a loss of trust, which correlates highly toa reduced use of automation (Muir and Moray,1996). As our MIP system aims at assisting theuser in complex and demanding tasks, maintain-ing the user’s trust into the system and thereby thewillingness to let the system decide autonomouslyis crucial. Furthermore, previous research hasshown that the use of explanations can help toaddress trust issues related to intelligent adaptivesystems (Glass et al., 2008) and that it may re-duce negative effects in incomprehensible situa-tions (Nothdurft et al., 2014). Therefore, we haveassessed the effects of MIP phenomena like back-tracking and dead-ends on the user-experience andtested different strategies, and especially explana-tions, to communicate these events to the user.

6.1 Methodology

For the subjective measurement through self-ratings by the user, questionnaires have been used.The most fundamental user data is personal infor-mation, assessing age, gender and education. Weasked for the participants experience in the gen-eral use of technical systems and the user’s fore-knowledge in the corresponding domain of the ex-periment. Apart from the persona, we includeda number of standardized and validated question-naires: Human-Computer Trust (HCT) describesthe trust relationship between human and ma-chine and was assessed using the questionnaireby Madsen and Gregor (2000) measuring five di-mensions (Perceived Understandability, PerceivedReliability, Perceived Technical Competence, Per-sonal Attachment, Faith). The AttrakDiff ques-tionnaire extends the assessment of a DS or soft-ware in general from the limited view of usabil-

ity, which represents mostly pragmatic quality, tothe integration of scales measuring hedonic qual-ities. This questionnaire was developed by Has-senzahl et al. (2003) and measures the perceivedpragmatic quality, the hedonic qualities of stimu-lation and identity, and the attractiveness in gen-eral. In total 104 participants took part in the ex-periment. In average the subjects were 23.9 yearsold with the youngest being 18 and the oldest 41.Gender-wise the participants were almost equallydistributed with 52.9% males and 47.1% females.

In this scenario the user’s task was to createindividual strength training workouts. In eachstrength training workout at least three differentmuscle groups had to be trained and exercises cho-sen accordingly. The user was guided through theprocess by the system, which provided a selec-tion of exercises for training each specific musclegroup necessary for the workout. For example,when planning a strength training for the upperbody, the user had to select exercises to train thechest (see Figure 2). This selection correspondsto the integration of the user into the MIP process.The decision how to refine the task of training thechest is not made by the system, but left to the user.

Figure 2: Screenshot of the UI. Here, the user hasto select one chest exercise. If he doesn’t want todecide, “Let the System decide” can be clicked orsaid, starting the planner for this decision.

6.2 MIP Phenomena

For the MIP phenomena we implemented 4 vari-ants: The variants used in the evaluation werethe following: Backtracking with Notification (BT-N) where the system informs the user that previ-ously user-made decisions will not lead to a solu-tion and have to be undone. Dead-End Notifica-tion before (DE-B), where the user was presentedthe notification, beforehand on an extra slide, that

some refinements had to be deactivated in the up-coming selections, because they would not leadto a solution. Dead-End Notification during (DE-D), where the selection provided, on each selec-tion slide, the notification that some of the op-tions below had to be deactivated, because theywould not lead to a solution. The fourth varianttested the effect of the content of the notification.This means, that the Backtracking with Explana-tion (BT-E) variation additionally explained to theuser why the already made decisions had to berolled back. The participants were distributed bya random-function to the variants, resulting in 25participants for BT-N, 18 for BT-E, 18 for DE-B,and 21 for DE-D (without unusable subjects).

6.2.1 Temporal ContiguityOur first hypothesis was that in general the tempo-ral contiguity of the system-provided notificationdoes affect the user-experience. We assumed thatproviding the notification before the upcoming se-lection will perform better than on the selection it-self, and way better than providing the notificationafter the decision, because the amount of conse-quences for the user is directly influenced by thetemporal contiguity of the notification. In termsof HCT we expected that especially the bases ofperceived reliability and understandability will bedefected stronger by a larger temporal contiguity.

Results. There was a statistically significant dif-ference between groups with notifications (i.e.,without BT-E) in HCT-bases (see fig. 3) as deter-mined by one-way ANOVA for perceived reliabil-ity (F (2, 70) = 3.548, p = .034), perceived un-derstandability (F (2, 70) = 4.391, p = .016), andsignificant for personal attachment (F (2, 44) =3.401, p = .042). While analyzing the AttrakDiffquestionnaire data we found a statistically signifi-cant difference for the dimension of hedonic quali-ties - stimulation between groups as determined byone-way ANOVA (F (2, 44) = 3.266, p = .048).The Fisher LSD post hoc test revealed statisticaldifference at the .05 level between BT-N (M =4.04, SD = 1.03) and DE-D (M = 3.26, SD =.72). Also DE-D was significantly different at the.05 level from DE-B (M = 4.12, SD = 1.03).

Discussion. The results support our hypothesisthat the temporal contiguity of the notificationdoes indeed influence the user-experience. A tech-nical system will be perceived to be most reli-able, when the notification is presented before the

decision-making (DE-B), because no unexpectedevents occur, and the least when user decisionshave to be undone (BT-N). For perceived under-standability presenting the notification during thedecision performed best, maybe because the de-activation of selection options could be allocatedmore directly to the notification itself and there-fore foster the user’s understanding of the situ-ation. The personal attachment was mostly de-fected when using notifications during decisionmaking. The results in general support that a pos-itive temporal contiguity (i.e., DE-B) seems to bethe best option for a technical system. While theunderstandability performs only second best, theperceived reliability, personal attachment, overallcognitive load, difficulty, fun, extraneous load andpragmatic as well as hedonic qualities, and over-all attractiveness perform either best or as goodas the other conditions using only notifications.This notification, which only represents some sortof shallow justification for the experienced systembehavior, also seems to be important for the per-ceived user-experience. Hence, we evaluated howa real explanation would perform opposed to shal-low justifications (i.e., the notification condition).

6.2.2 The Effects of Explaining MIPFor testing the effects of an extensive explanation,we exchanged the backtracking notification withan explanation. The notification that the made de-cision will not lead to a solution was exchangedwith “the system has detected that the gym isclosed today due to a severe water damage. There-fore, you have to decide again and select exercisessuitable for training at home”. This condition (BT-E) was then compared to the notification condition(BT-N). Thus, a pairwise t-test was used.

Results. Examining the HCT-bases we foundsignificant differences between BT-N (M =2.8, SD = 1.05) and BT-E (M = 3.56, SD =.82) for perceived reliability (t(3.0) = 57, p =.004). For perceived understandability the meandiffered significant (t(3.99) = 57, p = .000) withBT-N (M = 2.40, SD = 1.05) and BT-E (M =3.44, SD = .87). Observing the perceived tech-nical competence BT-N (M = 2.75, SD = 1.03)and BT-E (M = 3.28, SD = .66) also performedsignificantly different (t(2.06) = 41, p = .045).

In the AttrakDiff we observed a significantdifference (t(2.37) = 41, p = .022) forthe dimension of experienced pragmatic qual-

1

1.5

2

2.5

3

3.5

4

4.5

5

Reliability Understandability TechnicalCompetence

Faith PersonalAttachment

Stro

ngl

y d

isag

ree –

stro

ngl

y ag

ree

BT-E

BT-N

DE-B

DE-D

Figure 3: This shows the average mean of the bases of human-computer trust for each variant on a 5-pointLikert scale. The whiskers represent the standard errors of the mean.

ities comparing BT-E (M = 4.86, SD =1.16) and BT-N (M = 4.01, SD = 1.15).Taking a closer look at the word pairs sig-nificant differences below the .05 level werefound for complicated-simple, unpredictable-predictable, confusing-clearly structured, unruly-manageable, as well as for unpleasant-pleasant.

Discussion. Providing detailed explanations ofthe system behavior, in this case of backtracking,does indeed help to perceive the system as morereliable, more understandable, and more techni-cally competent. As only the backtracking noti-fication was modified, we can only infer that forthe other variants (i.e., DE-D and DE-B) the ef-fect would be similar. However, this seems logicalbecause the goal of increasing the system’s trans-parency to the user can be achieved using simi-lar explanations as well. Taking a look at the At-trakDiff and its single word pairs it becomes obvi-ous that explaining system behavior helps to im-prove the pragmatic qualities of a system com-pared to providing none to minimal notifications.Systems with explanation capabilities seem to beperceived as not so complicated, more predictable,manageable, more clearly structured and in gen-eral as more pleasant.

Experiment Conclusion. Combing the resultsof both evaluated factors (i.e., temporal contigu-ity and explanations) we argue that the best optionfor MIP system behavior would be explaining theuser why e.g., several options have been prunedfrom a selection beforehand. This strengthens theneed for, on the one hand, intelligent and under-standable explanation capabilities of such systemsand on the other hand that the user is only inte-grated into the decision making when the systemis sure that the presented options do in fact, or atleast most probably, lead to a solution. Otherwise,the negative effects of occurring backtracking and

similar planning peculiarities will impair the rela-tionship between human and machine.

7 ConclusionIn this paper we pointed out the importance for fu-ture intelligent systems of intertwining dialog sys-tems and AI Planning into a MIP system. First,we elucidated the potentials, but also the risksand arising problems of these mixed-initiative sys-tems. On the one hand, humans can profit fromplanning techniques like parallel exploration, ex-cluding non-valid planning paths from the searchspace. On the other hand, planning-related eventslike backtracking or dead-ends may impair theuser experience. Second, we described our ap-proach of a coherent and user-friendly mixed-initiative system. This included the use of a mu-tual knowledge model, in form of an ontology, togenerate coherent domain models for dialog andplanning as well as the development of a subordi-nate decision model, controlling who is in chargeof the decision-making process. Furthermore, weevaluated our implementation on the effects ofMIP events and tested different strategies to han-dle those. Concluding, we remark that the po-tentials of the integration of AI planning into aDS have to be weighed against the drawbacks likebacktracking or dead-ends and their effects on theuser experience. However, increasing the user’sperceived system transparency by including validexplanations on these behaviors may mitigate thenegative effects, thus increasing the potential ar-eas of application for this kind of mixed-initiativesystems.

Acknowledgment

This work was supported by the TransregionalCollaborative Research Centre SFB/TRR 62“Companion-Technology for Cognitive TechnicalSystems” which is funded by the German Re-search Foundation (DFG).

ReferencesMitchell Ai-Chang, John Bresina, Len Charest, Adam

Chase, JC-J Hsu, Ari Jonsson, Bob Kanefsky, PaulMorris, Kanna Rajan, Jeffrey Yglesias, et al. 2004.Mapgen: mixed-initiative planning and schedulingfor the mars exploration rover mission. IntelligentSystems, IEEE, 19(1):8–12.

Gregor Behnke, Denis Ponomaryov, Marvin Schiller,Pascal Bercher, Florian Nothdurft, Birte Glimm, andSusanne Biundo. 2015. Coherence across compo-nents in cognitive systems – one ontology to rulethem all. In Proc. of the 25th Int. Joint Conf. onArtificial Intelligence (IJCAI 2015). AAAI Press.

Pascal Bercher, Susanne Biundo, Thomas Geier, ThiloHoernle, Florian Nothdurft, Felix Richter, andBernd Schattenberg. 2014. Plan, repair, execute, ex-plain - How planning helps to assemble your hometheater. In Proc. of the 24th Int. Conf. on AutomatedPlanning and Scheduling (ICAPS 2014), pages 386–394. AAAI Press.

Susanne Biundo, Pascal Bercher, Thomas Geier, FelixMuller, and Bernd Schattenberg. 2011. Advanceduser assistance based on AI planning. Cognitive Sys-tems Research, 12(3-4):219–236. Special Issue onComplex Cognition.

Kutluhan Erol, James A. Hendler, and Dana S. Nau.1994. UMCP: A sound and complete procedurefor hierarchical task-network planning. In Proc. ofthe 2nd Int. Conf. on Artificial Intelligence PlanningSystems (AIPS 1994), pages 249–254. AAAI Press.

Juan Fernandez-Olivares, Luis A. Castillo, OscarGarcıa-Perez, and Francisco Palao. 2006. Bring-ing users and planning technology together. experi-ences in SIADEX. In Proc. of the 16th Int. Conf. onAutomated Planning and Scheduling (ICAPS 2006),pages 11–20. AAAI Press.

Thomas Geier and Pascal Bercher. 2011. On the decid-ability of htn planning with task insertion. In Proc.of the 22nd Int. Joint Conf. on Artificial Intelligence(IJCAI 2011), pages 1955–1961. AAAI Press.

Alyssa Glass, Deborah L. McGuinness, and MichaelWolverton. 2008. Toward establishing trust in adap-tive agents. In Proc. of the 13th Int. Conf. on Intel-ligent User Interfaces (IUI 2008), pages 227–236.ACM.

Marc Hassenzahl, Michael Burmester, and FranzKoller. 2003. Attrakdiff: Ein fragebogen zur mes-sung wahrgenommener hedonischer und pragmatis-cher qualitt. In Gerd Szwillus and Jurgen Ziegler,editors, Mensch & Computer 2003: Interaktion inBewegung, pages 187–196, Stuttgart. B. G. Teubner.

Frank Honold, Felix Schussel, and Michael Weber.2012. Adaptive probabilistic fission for multimodalsystems. In Proc. of the 24th Australian Computer-Human Interaction Conf., OzCHI ’12, pages 222–231, New York, NY, USA, November, 26–30. ACM.

Frank Honold, Felix Schussel, Michael Weber, FlorianNothdurft, Gregor Bertrand, and Wolfgang Minker.2013. Context models for adaptive dialogs and mul-timodal interaction. In 9th Int. Conf. on IntelligentEnvironments (IE 2013), pages 57–64. IEEE.

Maria Madsen and Shirley Gregor. 2000. Measuringhuman-computer trust. In Proc. of the 11 th Aus-tralasian Conf. on Information Systems, pages 6–8.ISMRC, QUT.

Bonnie M Muir and Neville Moray. 1996. Trust inautomation. part ii. experimental studies of trust andhuman intervention in a process control simulation.Ergonomics, 39(3):429–460.

Karen L. Myers, W. Mabry Tyson, Michael J. Wolver-ton, Peter A. Jarvis, Thomas J. Lee, and Marie des-Jardins. 2002. PASSAT: A user-centric planningframework. In Proc. of the 3rd Int. NASA Workshopon Planning and Scheduling for Space, pages 1–10.

Karen L. Myers, Peter A. Jarvis, W. Mabry Tyson, andMichael J. Wolverton. 2003. A mixed-initiativeframework for robust plan sketching. In Proc. of the13th Int. Conf. on Automated Planning and Schedul-ing (ICAPS 2003), pages 256–266. AAAI Press.

D. Nau, T.-C. Au, O. Ilghami, U. Kuter, D. Wu, F. Ya-man, H. Munoz-Avila, and J.W. Murdock. 2005.Applications of shop and shop2. Intelligent Systems,IEEE, 20(2):34–41, March.

Florian Nothdurft, Felix Richter, and WolfgangMinker. 2014. Probabilistic human-computer trusthandling. In Proc. of the Annual Meeting of theSpecial Interest Group on Discourse and Dialogue,pages 51–59. ACL.

Marvin Schiller and Birte Glimm. 2013. Towards ex-plicative inference for OWL. In Proc. of the Int. De-scription Logic Workshop, volume 1014, pages 930–941. CEUR.

Felix Schussel, Frank Honold, and Michael Weber.2013. Using the transferable belief model for multi-modal input fusion in companion systems. In Multi-modal Pattern Recognition of Social Signals in HCI,volume 7742 of LNCS, pages 100–115. Springer.

Bastian Seegebarth, Felix Muller, Bernd Schattenberg,and Susanne Biundo. 2012. Making hybrid plansmore clear to human users – a formal approach forgenerating sound explanations. In Proc. of the 22ndInt. Conf. on Automated Planning and Scheduling(ICAPS 2012), pages 225–233. AAAI Press.

Shirin Sohrabi, Jorge Baier, and Sheila A. McIlraith.2009. HTN planning with preferences. In 21st Int.Joint Conf. on Artificial Intelligence (IJCAI 2009),pages 1790–1797. AAAI Press.

W3C OWL Working Group. 2009. OWL 2 WebOntology Language: Document Overview.Available at http://www.w3.org/TR/owl2-overview/.

8 Appendix A: Example Planning and Dialog SnippetInit

Goalwarmup

BarbellDeadlift(8, 60)

Legs

BarbellSquat

DumbbellSquat

…BarbellSquat

DumbbellSquat

…

Leg Training

Chose your exercise:

Don‘t careBack

Barbell

Init

Goalwarmup

BarbellDeadlift(8, 60)

BarbellSquat(?,?)

Available modifications

Figure 4: This figure shows an example planning and dialog snippet. The abstract planning task “Legs”has to be decomposed into a primitive task. In this case the decision that of the modifications is to beincluded in the plan is done by the user. A screenshot of an exemplary decision-making by the userwas presented in Figure 2. Afterwards, the abstract task is refined using the selected modification andintegrated with the plan.

the interplay of user-centered dialog systems and ai planning · user-friendly search strategies....

Documents