dynamic decision making: a comparison of approaches

20
JOURNAL OF MULTI-CRITERIA DECISION ANALYSIS J. Multi -Crit. Decis. Anal. 9: 243–262 (2000) Dynamic Decision Making: A Comparison of Approaches PAULO C.G. DA COSTA and DENNIS M. BUEDE* Department of Systems Engineering and Operations Research, George Mason University, Fairfax, Virginia, USA ABSTRACT This paper is concerned with a specific type of problem, namely dynamic decisions, for which most techniques fail to provide adequate solutions. Here, we present two of the most promising optimization techniques, partially observable Markov decision processes (POMDP) and dynamic decision networks (DDN), while arguing which is the most suitable for this problem domain. Copyright © 2000 John Wiley & Sons, Ltd. KEY WORDS: Bayesian updating; DDN; dynamic decisions comparison; decision support; multi-criteria; POMDP 1. INTRODUCTION Taking decisions is an everyday activity for all of us, and we do spend a lot of time in our lives dealing with the art of deciding how and when to act in a decision situation. Fortunately, most of our problems are fairly simple and do not demand excessive effort and thought from us in order to achieve reasonable solutions. Nevertheless, there are times when choosing the best decision or set of interrelated decisions can be a very important, challenging, time-consuming task. This is where the use of scientific decision techniques is a valu- able asset. Most of these tools are based on the ‘divide and conquer’ approach, in which a complex problem is optimally solved by dismantling it into smaller, less complicated parts through a process that we may call decision modelling. Usually, a model is more likely to be useful if it accurately reflects the central characteristics of the real situation. For decision procedures, these issues are the set of available options, the criteria to be followed when choosing an option within this set, and the uncer- tainty on the outcomes of each option. For the values, the theoretical support is provided by the postulates of Utility Theory (Von Newmann and Morgenstern, 1944; Savage, 1954, 1961) further extended to a more comprehensive set called Multi-Attribute Utility Theory (Keeney and Raiffa, 1976; French, 1986; Watson and Buede, 1987), while the latter issue is better handled by the axioms of the Probability Theory (Laplace, 1951; Keynes, 1957; Weatherford, 1982; French, 1986). 2. DYNAMIC DECISIONS The decision-maker should be aware of the match between the particulars of the modelling process and the problem’s characteristics. Using a sophis- ticated decision analysis technique to solve a lesser problem can be a great waste of time and money. For most problems, there will be a handful of techniques that can be successfully applied. How- ever, in this paper, we are concerned with a specific class of problem, namely dynamic proba- bilistic systems. In such an environment, decisions are taken at different points in time, with uncer- tain outcomes or conflicting objectives affecting the decisions that are to be taken later; in a process that is usually called dynamic decisions. An insightful representation of the decision problem domain can be found in Howard (1998), where a given situation may be classified by the number of its variables, by its relationship with the time domain, and by its degree of uncertainty. Clearly, the problems with many variables in a dynamic situation with significant uncertainty are the most difficult; an ‘optimal’ solution for this problem class is beyond our grasp for most prob- lems today. In addition to Howard’s three problem dimen- sions, a dynamic probabilistic decision system * Correspondence to: Science and Technology 2, Room 323, Department of Systems Engineering and Opera- tions Research, George Mason University, Fairfax, VA 22030-4444, USA. E-mail: [email protected] Copyright © 2000 John Wiley & Sons, Ltd. Recei6ed 10 December 1999 Accepted 20 September 2000

Upload: paulo-cg-da-costa

Post on 06-Jun-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic decision making: a comparison of approaches

JOURNAL OF MULTI-CRITERIA DECISION ANALYSIS

J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Dynamic Decision Making: A Comparison of Approaches

PAULO C.G. DA COSTA and DENNIS M. BUEDE*Department of Systems Engineering and Operations Research, George Mason University, Fairfax, Virginia, USA

ABSTRACT

This paper is concerned with a specific type of problem, namely dynamic decisions, for which most techniques failto provide adequate solutions. Here, we present two of the most promising optimization techniques, partiallyobservable Markov decision processes (POMDP) and dynamic decision networks (DDN), while arguing which isthe most suitable for this problem domain. Copyright © 2000 John Wiley & Sons, Ltd.

KEY WORDS: Bayesian updating; DDN; dynamic decisions comparison; decision support; multi-criteria;POMDP

1. INTRODUCTION

Taking decisions is an everyday activity for all ofus, and we do spend a lot of time in our livesdealing with the art of deciding how and when toact in a decision situation. Fortunately, most ofour problems are fairly simple and do not demandexcessive effort and thought from us in order toachieve reasonable solutions. Nevertheless, thereare times when choosing the best decision or setof interrelated decisions can be a very important,challenging, time-consuming task. This is wherethe use of scientific decision techniques is a valu-able asset.

Most of these tools are based on the ‘divide andconquer’ approach, in which a complex problemis optimally solved by dismantling it into smaller,less complicated parts through a process that wemay call decision modelling. Usually, a model ismore likely to be useful if it accurately reflects thecentral characteristics of the real situation. Fordecision procedures, these issues are the set ofavailable options, the criteria to be followed whenchoosing an option within this set, and the uncer-tainty on the outcomes of each option. For thevalues, the theoretical support is provided by thepostulates of Utility Theory (Von Newmann andMorgenstern, 1944; Savage, 1954, 1961) furtherextended to a more comprehensive set calledMulti-Attribute Utility Theory (Keeney and

Raiffa, 1976; French, 1986; Watson and Buede,1987), while the latter issue is better handled bythe axioms of the Probability Theory (Laplace,1951; Keynes, 1957; Weatherford, 1982; French,1986).

2. DYNAMIC DECISIONS

The decision-maker should be aware of the matchbetween the particulars of the modelling processand the problem’s characteristics. Using a sophis-ticated decision analysis technique to solve alesser problem can be a great waste of time andmoney.

For most problems, there will be a handful oftechniques that can be successfully applied. How-ever, in this paper, we are concerned with aspecific class of problem, namely dynamic proba-bilistic systems. In such an environment, decisionsare taken at different points in time, with uncer-tain outcomes or conflicting objectives affectingthe decisions that are to be taken later; in aprocess that is usually called dynamic decisions.

An insightful representation of the decisionproblem domain can be found in Howard (1998),where a given situation may be classified by thenumber of its variables, by its relationship withthe time domain, and by its degree of uncertainty.Clearly, the problems with many variables in adynamic situation with significant uncertainty arethe most difficult; an ‘optimal’ solution for thisproblem class is beyond our grasp for most prob-lems today.

In addition to Howard’s three problem dimen-sions, a dynamic probabilistic decision system

* Correspondence to: Science and Technology 2, Room323, Department of Systems Engineering and Opera-tions Research, George Mason University, Fairfax, VA22030-4444, USA. E-mail: [email protected]

Copyright © 2000 John Wiley & Sons, Ltd.Recei6ed 10 December 1999

Accepted 20 September 2000

Page 2: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE244

would also have to cope with the ‘downstreamdecisions’ (Howard, 1998), where an action madeat time ‘t ’ would change the problem’s character-istics for the next iteration of the system (‘t+1’).So a fourth axis would be the time dependency ofthe decisions taken, which brings a new source ofcomplexity to the problem, and has preventedmany decision systems from succeeding. Inessence, evolving decisions are taken by analysinginformation that is usually gathered asyn-chronously.

3. APPROACHES FOR DYNAMICDECISIONS

Decisions that evolve over time, depend uponasynchronous data, and address competing objec-tives are a huge challenge for every decision-maker or automated decision system. Anappropriate formal approach for facing this prob-lem is dynamic programming (DP) (Bellman,1957; Bertsekas, 1995), a technique that makes acomprehensive analysis of all possible alternativesthrough an elegant, yet computationally hardalgorithm.

In short, the idea is to decompose the problemby ordering its decisions into segments. Thenstarting with the last segment, evaluate each deci-sion segment (moving backwards to the first seg-ment) for all possible prior segment sequencesusing a value function that can be segmented intoadditive (or multiplicative) components to matchthe decision segments. Ultimately, one decisionalternative is selected for each segment; this set ofselected alternatives is called a portfolio that pro-vides the highest value. Each possible portfolio iscalled a path, and a path that returns the maxi-mum value of all possible paths is considered anoptimal path. As stated above, for dynamic pro-gramming to be applicable, the value function hasto be separable for the chosen decision segments.This separable value function allows each path’soverall value to be computed by the sum of thevalue of each of its decision segments.

Dynamic programming can be considered avery precise, elegant approach for dynamic deci-sions that can be structured to meet its assump-tions, namely segmenting the decisions with aseparable value function across segments. Inessence, these assumptions permit all possiblepaths to be analysed without being computed.

For example, if two subpaths over the same priordecision segment are computed and one is better,then the less valuable subpath can be droppedfrom consideration when additional segments areconsidered. Unfortunately, in spite of those ad-vantages, the lack of an efficient computationalalgorithm makes DP more a conceptual tool thana practical one for many real world dynamicdecisions.

In recent years, two optimization techniqueshave been developed to address dynamic deci-sions: partially observable Markov decision pro-cesses (POMDP) (Astrom, 1965; Sondik, 1971,1978; Smallwood and Sondik, 1973; Monahan,1982) and dynamic decision networks (DDN)(Buede, 1999; Buede and Costa, 1999; Costa,1999). The first is a special case of a Markov DP,while the latter combines Bayesian networks(BN), influence diagrams, and multi-attribute util-ity theory.

Both techniques are powerful enough for mod-elling dynamic decisions. However, it is our viewthat the Markovian approach, while presentingserious limitations on its tractability, still lagsbehind the DDN technique in terms of overallflexibility for dealing with this specific kind ofproblem. In order to introduce this idea, we ini-tially provide a brief background on each ap-proach and then present different case studies.The first example, a very simple one, is intendedto give a fairly straightforward view on how thesolution process of each approach is conducted.The second case study involves multi-sensor datafusion with a time dependent decision, which is aspecial class of dynamic decision. This example isintended to provide a means for a more in-depthanalysis of each technique.

Following the case studies, Section 4 presentsour comparative analysis of the POMDP andDDN approaches on the case studies. In thatchapter, we ignore the issues regarding thetractability of the models so we can focus on theconceptual issues that deal directly with framingdynamic decisions. As a result of this scheme, ourconclusions can be seen as independent of thecurrent state-of-art of each approach.

4. DECIDING WITH MARKOV PROCESSES

4.1. Markov processes backgroundIn order to understand a Markov process, theconcepts of state and state transitions are crucial.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 3: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 245

A state is any set of variables (thus called systemstate variables), the values of which fully describethe entire system at a given instant in time. Astate transition represents the probability of goingfrom a given state to another and the set of asystem’s state transitions is usually represented byan n×n square matrix called the state transitionmatrix, where n is the number of possible states ofthe system.

The number of state variables and the numberof values of each variable are directly related tothe required level of complexity for each givenapplication, preferably being as small as possiblefor the sake of computational simplicity. Figure 1shows a simple situation modelled as a Markovprocess.

In this example, the manager of a four-truckfleet delivery company has to plan the driver’sschedule for the next day. However, truck mainte-nance is conducted at night and he needs to buildthe schedule before noon. In order to predict thenext day’s truck availability, the manager de-signed a model that uses the number of trucks ina given day to forecast the same information forthe next day. This very simplified model needsonly one variable to describe the situation. Such avariable would have four possible values: zero,one, two, or three available trucks, which are alsothe states of the system. A transition probabilitymatrix for this case would look like the onebelow.

From/to 0 1 2 3

0123

ÆÃÃÃÈ

0.12 0.45 0.30 0.230.08 0.30 0.42 0.250.04 0.26 0.30 0.400.02 0.10 0.28 0.60

ÇÃÃÃÉ

State transition probability matrix

Intrinsic to this model is the key concept thatnames the process: the Markov assumption. TheMarkov assumption states that the next state s ofthe system at time t+1, st+1, is probabilisticallydependent upon the current state (st), but condi-tionally independent of the previous states giventhe current state:

P(st+1�st, st−1, st−2, . . . , st−n )=P(st+1�st)

For the truck manager, the fact that he had twotrucks available this morning is all the managerneeds to know in order to predict tomorrowmorning’s probability of having zero (4%), one(26%), two (30%), or three trucks (40%). In otherwords, due to the Markov assumption, no poste-rior history prior to this morning’s information isrequired.

4.2. Markov DPThe manager’s truck model gives only a proba-bilistic expectation on the next day’s truckavailability, which may guide other events likedriver’s and delivery schedules. However, eventhough we may call it a dynamic probabilisticmodel, it is not a ‘problem’ yet, because it actsjust as a predictor of future events. A lot of powerand usefulness may be added to this scheme whenwe also model the manager’s decisions with out-comes and values, which affect the system’s states,transforming it into a Markov DP (MDP)(Howard, 1960, 1971; White, 1992).

That transformation can be achieved by theintroduction of a reward function to the model,which attaches values for both state occupanciesand transitions. Note that one multi-attributevalue function could be defined over states andanother over transitions. In addition, since bothstates and transitions are assumed finite, a rewardfunction for states and transitions can be precom-puted. The general form of the reward function is

V= f(O1, O2, . . . , On)=Skg(Sik)+h(Tijk)

where O1, O2, . . . , On are the objectives of thedecision-maker, Sik means state i at time k, andTijk means transition from state i to state j at timek. Such a function allows any algorithm to weightthe benefits of the outcomes and transitions ofevery action available to the decision maker, thusproviding it with the necessary criteria for choos-ing the best among those.Figure 1. Truck availability model.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 4: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE246

As we might infer from the above paragraphs,an MDP is comprised of a state space, the transi-tion probabilities that govern the dynamic natureof this process, a (finite) set of decisions that maycontrol the system, and a reward structure forevaluating the outcomes of each available decisionfrom that set. In addition, the subset of optionstaken in each iteration of the model may be calleda policy, and the best policy (the one with the bestexpected reward) is considered the solution of itsrespective MDP.

Another important characteristic of a Markovmodel is its horizon length, which can be finite orinfinite, depending upon the maximum value ofiterations modelled by the system. When ‘n ’ (thenumber of iterations) is finite, there are differentapproaches to achieve a solution of a MDP (seePutterman, 1994), one being the Value Iterationtechnique (Howard, 1960). However, for an in-finite ‘n ’ the solution has to be obtained througha mathematical approximation of the value func-tion via a convex piecewise function (Sondik,1971).

Initially, the Value Iteration algorithm imple-ments DP by calculating the reward of each deci-sion for the last iteration (‘n ’), then repeating thecalculation for period ‘n−1’ and adding the re-sults with the already calculated rewards of ‘n ’ foreach option, forming various ‘paths of options’(policies). This process goes on until reaching theinitial state is reached when the best policy is thepath of options that leads to the greatest expectedreward.

4.3. POMDPA major pitfall when modelling complex, dynamicsituations with Markov decision models is toknow for sure what the current state of the systemis. Indeed, knowing the system’s current statemeans to know the exact value of each systemvariable at the present time, a strong demand inhighly uncertain environments like the ones foundin multi-sensor data fusion applications.

Further developments in Markov decision pro-cesses brought about the concept of POMDP. APOMDP is a hidden Markov model about whichwe can receive partial information regarding thecurrent state, so we are not able to state withcertainty what that state is. Although this kind ofmodel is more realistic, the lack of certainty aboutthe current state of the system prevents the use ofthe dynamic programming’s conventional recur-

sive algorithm for establishing the best policy.Also, it is important to note that no new conceptsfor value modelling are introduced with POMDP.

In addition, because the information received isnot complete, we will have only an estimation ofthe system’s state so a probability distributionfunction (pdf) will specify our current belief of it.Yet, as we have a dynamic system, there will be aflow of (partial) information feeding the currentstate’s pdf, and an updating process should takeplace considering the probability of each observa-tion given each state of the model. In other words,we still have a (discrete) Markov process, but itscurrent state is a probability distribution functionupdated via Bayes’ rule (a non-Markov, continu-ous process). Such a scheme is portrayed in Fig-ure 2.

Here, a situation can be modelled as a Markovprocess with four states, but the decision-makerreceives only hints about the current system’sstate. As we see in the picture, although theunderlying model is discrete, the information flowfeeds a continuously updated probability distribu-tion function on the discrete model’s current state.Nevertheless, the decisions are usually (but notnecessarily) taken just after the Markov systemmakes its transitions.

Partial information about a process reflects thesituation of most real life situations, which makesthe POMDP approach a solid modelling tech-nique. Unfortunately, the downside is the lack ofefficient algorithms for solving these more com-plex models. Actually, many different approacheshave been investigated to find a fast, reliablealgorithm. Examples are algorithms based onvalue iteration (Sawaki and Ichikawa, 1978; Cas-sandra et al., 1994), policy iteration (Sondik,1978; Putterman, 1990; Hansen, 1997, 1998a,b),accelerated value iteration (White and Scherer,1989), structured representations (Boutilier andPoole, 1996), incremental pruning (Zang and Liu,1996; Cassandra et al., 1997), and extra percep-tual data (Chrisman, 1992).

5. DDN

5.1. Influence diagrams and BNBN are acyclic, directed graphs whose nodes rep-resent random variables, while its edges show thejoint probability distribution among these ran-dom variables. The technique has been success-fully used for creating consistent probabilistic

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 5: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 247

Figure 2. A four-state POMDP.

representations of uncertain knowledge andproviding computationally efficient algorithms forcomputing the posterior probabilities of specificnodes given evidence. BN have been applied indiverse fields like medical diagnosis (Spiegelhalteret al., 1989), image recognition (Booker and Hota,1988), search algorithms (Hansom and Mayer,1989), and many others. A detailed list of currentapplications of BN is provided by Welman andHeckerman (1995).

One of the most important features of BN isthe fact that this approach provides an elegantmathematical structure for modelling complicatedrelationships among random variables while keep-ing a relatively simple visualization of these rela-tionships. However, this paper is not intended toprovide an explanation of BN. For more informa-tion on this subject, see Pearl (1988), Neapolitan(1990), Oliver and Smith (1990), Charniak (1991),or Jensen (1996).

The initial works on influence diagrams werefocused on developing a computer-aided mod-elling tool for representing decision analysis prob-lems (Miller et al., 1976; Howard and Matheson,1981). Olmsted (1983) proposed and defined theidea of manipulating that tool in order to notonly represent, but also solve decision analysisproblems. Shachter (1986, 1988, 1990) further de-veloped the necessary mathematical backgroundfor solving decision analysis problems. Multipleobjective problems can be formulated and solvedeasily in influence diagrams (Merkhofer, 1988;Buede and Ferrel, 1993).

Once well elaborated, an influence diagram canbe analysed by a combination of graph-theoreticand mathematical operations for determining theoptimal decision strategy to be taken. The opera-tions that allow a complete analysis of any in-fluence diagram are evidence absorption,deterministic absorption, null reversal, arc rever-sal and deterministic propagation.

Influence diagrams share the same graph theoryprinciples as BN; that is, the same representationof a set of random variables and its joint proba-bility distribution. However, unlike the unique setof variables found in BN, influence diagramscontain four different sets of random variables,one for each type of node that can be present inthe model: value nodes, decision nodes, chancenodes, and deterministic nodes.

Insightful discussions about the value of infor-mation in uncertain decision problems can befound in La Valle (1968a,b,c), Ziemba and Butter-worth (1975), and Hilton (1981). Yet Matheson(1990) shows the advantages of using such influ-ence diagrams as modelling tools with respect totheir ability to capture the value of information inthe modelling process.

So far, we have analysed the BN as inferenceengines and the influence diagrams as a utilitymaximization engine within the realm of multi-attribute decision theory. Both techniques, albeitrelatively recent, can be considered as being al-ready established as up-to-date topics in currentscientific research. The next section is devotedto introducing a structure that employs both

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 6: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE248

Figure 3. Dynamic decisions in the ‘DDN zone’.

evolving window, information is being receivedthat affects the resolution of uncertainty concern-ing outcomes on the conflicting objectives for D0;see Figure 3. Finally, an additional decision canalso be made about what information to gather,D1. D0 addresses the current situation, while D1

recognizes the needs of the future.The part of the decision problems’ domain

shown in this picture is where the DDN techniqueclearly outperforms previous approaches, so wemay refer to it as the ‘DDN zone’. An example ofa problem that can be classified in the ‘DDNzone’ of the problem domain is the fighter pilot’sweapon and sensors allocation problem (Buedeand Costa, 1999; Costa, 1999), which is discussedlater in this paper.

A DDN can be seen as having three differentparts (see Figure 4): the decision sub-net, the datafusion sub-net, and the inference sub-nets. Thesesub-nets are not constant with respect to time,and the modifications between two subsequenttime steps can be caused by external changes, likea new track from a sensor in a tracking controlapplication, or by internal decisions, as the deci-sion sub-net may turn a sensor on or off, depend-ing on the value of its information.

One or more influence diagrams, whose mainobjective is to define an optimal set of actions tobe taken during time step ‘t ’, comprise the

techniques in order to achieve a synergy level thatmaximizes their benefits, while eliminating theeventual shortcomings of each technique whenemployed alone to complex, real-time problems.

5.2. The ‘DDN zone’The main idea under the concept of a DDN is toapply both BN and influence diagrams in thesame application, while optimizing the resultingsynergy between these techniques. Here, the focuswill be in exploiting the advantages of each ap-proach, while eliminating their weaknessesthrough the merging process.

DDNs are intended to solve complex, real-timedecision problems that meet specific criteria.These criteria are an evolving decision window, inwhich a specific decision, D0, (e.g. launching aproduct in the market or firing a missile) can betaken or not taken at any time. During this

Figure 4. DDN integrated defence system.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 7: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 249

decision sub-net. These influence diagrams alsocalculate the value of information provided to theinference sub-nets, which will be taken into ac-count when defining the optimal set of actions forthat time step. Choosing between a single influ-ence diagram to cover decisions implies a morecompact scheme, which reduces memory require-ments; while adopting separate influence diagramsfor each distinct set of decisions (e.g. sensor-related decisions, weapon-related decisions) in-creases the control over the various issues in-volved, improving the flexibility of the system.

One or more separate influence diagrams maybe used in the data fusion sub-net, which collectsthe results from the many inference sub-nets anddistributes them to the decision sub-net in a nor-malized way. In the multi-sensor data fusion ex-ample we will discuss shortly, the data fusionsub-net has no specific values to be considered ordecisions to be made, so a simple set of chanceand deterministic nodes are used for fusing thedata from the inference sub-net.

Finally, the inference sub-net is comprised of aset of BN whose main purpose is to analyse thedata provided to the system, and to make proba-bility inferences on that data. Depending upon theapplication, these BN can be interconnected eitherdirectly or only by the data fusion sub-net. Inmost applications, each BN will be linked to aseparate object of that application (e.g. in a track-ing system, each track will have a separate BNupdating its uncertainty). Figure 4 provides anexample of a possible use of the DDN techniqueas the supporting technique for an integrateddefence system.

In this sample application, the number of BN inthe inference sub-net will be a function of thenumber of tracks/intruders in the controlledspace. The number of nodes in each data fusionBN will depend upon the number and type oftracks in the inference sub-net. In addition, thenumber of nodes in each inference sub-net at timestep ‘t ’ is a function of the decisions taken in time‘t−1’. As an example, if a track is an enemy wildweasel aircraft, depending upon the value of in-formation versus the risk of keeping a radar acti-vated, the system may decide to turn a givenradar off, which will decrease the number of inputnodes of each inference sub-net’s BN.

A DDN system like the one portrayed in Figure4 may become quite complex, and its creation is atask that would require a major system engineer-

ing effort. This is the only way of dealing with theplethora of information that will arise from themany domain experts involved in the probabilityelicitation process. In addition, the wide spectrumof sensors, weapon systems and other externalinterfaces of the system will also make an opti-mized approach to the system’s life cycle a keyissue.

6. COMPARISON OF APPROACHES FORMACHINE MAINTENANCE

For a better understanding on how each tech-nique works, the initial part of this section bringsa very simple example extracted from Smallwoodand Sondik (1973), where the POMDP’s solutionof that article is compared with its Bayesian coun-terpart. Then, the comparison extends to a muchhigher level of complexity, where the DDN’s im-plementation of a real time multi-sensor, datafusion system for solving the fighter pilot’s prob-lem (Costa, 1999) is presented. Finally, the impos-sibility of achieving a similarly feasible solutionthrough the POMDP approach is analysed underthe light of the possible advantages and shortcom-ings of such a system.

6.1. The manufacturing machine problemA hypothetical manufacturing machine is made oftwo identical, independent internal components,and produces a finished product at the end of a1-h cycle. At the beginning of a cycle, there is aprobability of 0.1 that a given component will bedefective, and a defective component yields theprobability of 0.5 of causing a defective finishedproduct.

There are four possible control alternativesavailable for this process. The first, and simplestone, is to let the manufacturing process keepgoing without inspecting the internal componentsor examining the finished product. The secondchoice is to examine the finished product for itsquality at the end of each cycle. The third alterna-tive is to open the machine and inspect eachcomponent, which requires one full cycle to beexecuted; any inspected component recognized asdefective is immediately replaced. The last alter-native is to open the machine and replace bothcomponents without inspection.

The goal here is to find the best policy for twocases: a three-cycle stage and a four-cycle stage.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 8: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE250

The value of the structure is that a good, finishedproduct yields a reward of one, and a defectiveproduct achieves no reward (a reward of zero);these are rewards associated with outcomes. Costsassociated with transitions include the cost ofreplacing a machine’s component, which is one.Inspecting both components not only spends onecomplete cycle, but also implies a cost of 0.5.Finally, examining the finished product is consid-ered a process that has a cost of 0.25. As thesecosts are on the same scale, no multi-attributeweighting is necessary.

6.2. The POMDP solutionThis situation can be modelled with a high level ofdependability by the POMDP technique, as theoutput of the process (the finished product) con-veniently fits in the definition of partial informa-tion of an internal MDP (the hidden machinecomponents). That internal process will havethree distinct states, one for each possible combi-nation of component situation (0, both operable;1, one defective; and 2, two defectives), while theobservations could be either a good product or adefective product. In addition, there will be fourpossible actions for each stage: manufacture, ex-amine, inspect and replace, and replace only(MFG, EX, INSP and REP, respectively). Figure5, adapted from Smallwood and Sondik (1973),shows the solution for both scenarios (3 and 4stages).

Each point in the equilateral triangles of Figure5 represents the three probabilities for states 0, 1,and 2, that is, a possible belief of the state vector

p. In this representation, the perpendicular dis-tance from a given point to the side opposite tothe i-th vertex correspond to the degree that theprocess is believed to be in state i. As an example,if we believe that our current p vector is [0.5, 0.5,0.0], which means that our knowledge impliesequally likely probabilities of the system being ineither state 0 or 1, and zero probability of beingin state 2, then this point in the triangle will be inthe middle of the line between vertexes 0 and 1.

The equilateral triangles portrayed in Figure 5are called a policy region diagram, and showwhich decision policies are optimal for specificregions of p. Therefore, Figure 5 illustrates that ifwe consider that the belief vector p is in regionMFG, then the manufacturing option will give usthe best expected reward (for a three-stage hori-zon). Conversely, if it is very likely that bothcomponents are defective (vector p in regionREP), then the replace option should be taken. Inthe four-stage case, these are regions in whicheach possible alternative is optimal. Increasing thenumber of stages results in an even greater divi-sion of the triangle, where policy mixtures aremore common; see Smallwood and Sondik (1973)for a more detailed explanation of this subject.

6.3. Translating the POMDP to an influencediagramWe used the Bayesian updating software Netica™(trademark of Norsys Software) to develop amodel that is similar to the one shown in theabove paragraphs. Figure 6 depicts the three-stagemodel represented as a dynamic program, with

Figure 5. Solutions for the POMDP model.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 9: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 251

sample results for particular ‘states of the world’.The use of influence diagrams as a dynamic pro-gramming tool was initially proposed by Tatmanand Shachter (1990), where the idea of supervaluenodes was added to the original concept of influ-ence diagrams. That enhancement allowed themodel to mimic the ability of dynamic program-ming to deal with additive value functions. For aneasy correlation with the POMDP model, eachsituation is indexed by its respective POMDP pvector.

In this simple model, once we enter the p vector(in the form of a likelihood vector) in the leftmostbox (see arrow ‘a’), the expected value of eachoption for all stages is displayed in the decisionboxes (labelled ‘option c ’ in Figure 6). The tableembedded in Figure 6 shows the results for fourdifferent inputs of p vectors. Three of them corre-spond to the vertices of the triangle in Figure 5,while the other is related with the centre of thatsame triangle (equally likely probabilities).

Although simple and functional, the dynamicprogram has a serious shortcoming in its exponen-tially growing memory requirements. As a clarifi-cation, the main value node (the small node in thebottom of the diagram), which combines all theexpected values of the options in order to elect thebest policy, has a table that stores all the possiblecombinations of its value structure for the three-

stage model, 1000 elements (ten possible values foreach stage powered by 3, the number of stages).The same table in the four-stage model has 10000(104) elements, and so on. Even a simple modellike that would require a prohibitive amount ofmemory in order to compute more stages.

Nevertheless, the Bayesian approach has a greatadvantage over similar POMDP models. The tech-nique takes advantage of independence assump-tions to make computation more efficient, usuallyproducing fairly tractable models even for a largenumber of variables, which is not the case forMarkov models. As an example of this character-istic, suppose that this very same machine also hasa shaft that needs to be retightened eventually, aprocedure that is always done every time themachine is opened (INSP and REP options).

This shaft links the two components of themachine and we assume that it can be in one ofthree states: tightened, loosened, or vibrating. Aloose shaft deteriorates the machine’s performanceas a whole by 10%, while a vibrating shaft has thesame effect but with a margin of 25%. This deteri-oration is independent of the state of the ma-chine’s main components. Also, a tight shaft mayeither become loose or start vibrating with proba-bility of 0.15 and 0.04, respectively, while theprobability that a loose shaft will start vibrating is0.10.

Figure 6. Bayesian inference model for the machine problem with three stages.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 10: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE252

In practice, this new element in the model actslike a new variable that is independent from thevariable ‘states of the components’. In thePOMDP case, the extra variable increases three-fold the number of states, and thus the size of thep vector, which translates to a much harder solu-tion. In the BN case, the peculiarities of thesituation resulted in very few modifications to thevalue structure: the table for main value nodetable did not change at all, but three new chancenodes had to be added, resulting in practically nochanges in the solution algorithm’s performance.Figure 7 shows the new model, while the attachedtable shows some results for situations also in-dexed by its POMDP p vectors (now with nineelements).

Although the POMDP system performed fairlywell in this simple example, the technique’s sensi-tivity to the number of variables prevents it fromhaving the same performance in most real case

applications. Yet it is interesting to note that thisproblem does not meet all the constraints requiredfor the application of DDN; in other words, it isnot inside the ‘DDN zone’. As we said before, oneof the major characteristics of this specific part ofthe decision problem’s domain is the randomnessof the interval between the decisions.

Indeed, the decisions to be taken in the manu-facturing machine problem follow a predictablepattern with respect to time, which allows thesuccessful use of more than one optimizationtechnique. However, more complex problems withdecisions randomly spread through time pose atougher challenge to settle. This is where theDDN approach will have clear advantages overany technique. The next chapter presents an ex-ample of a problem inserted in the DDN zone,and shows how the POMDP model fails toprovide a feasible solution, even if we ignore thelimitations listed in the previous paragraph.

Figure 7. Two-element machine with the shaft component.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 11: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 253

7. COMPARISON OF APPROACHES FORMULTI-SENSOR DATA SYSTEMS

Although simple cases like the one we just vis-ited are convenient for comparison purposes,most real life situations can be dependably mod-elled only by much more complex schemes. Aclear illustration of this fact may be found whenmodelling real-time, feature-rich decision prob-lems like the fighter aircraft’s management ofsensors and weapons. For this specific kind ofproblem, it is our view that no other optimiza-tion technique can achieve more tractable, reli-able models (and results) than the DDNtechnique.

Indeed, it was already shown (Costa, 1999)that the rule-based system approach, thoughcomputationally attractive, does not provide aflexible and reliable means of dealing with thehigher degree of uncertainty associated with thissituation, usually achieving models that are naı̈veat best. Here, our intention is to demonstratethat DDN also outperforms the POMDP ap-proach for this type of application. First, thefighter pilot’s problem will be discussed and theDDN system developed in Costa (1999) will bepresented. Then, the final subsections will conveyour ideas and conclusions on the differences be-tween the two approaches.

7.1. The fighter pilot’s problemIn a typical mission, a fighter aircraft has totake-off from a friendly aerodrome, perform ahigh-altitude flight over friendly territory, de-scend to a lower altitude preferably before theenemy’s radar coverage, attack the mission’starget(s), and egress home safely. The enemy’srole is to detect the incoming fighter and denyits attack, using weapons like interceptor aircraftwith air-to-air missiles, ground-based anti-aircraft artillery (AAA), or ground-to-air mis-siles.

Behind the scenes lies a high-tech contest be-tween the intruder fighter and enemy’s forces,usually called electronic warfare. This contestcan be compared to a hide-and-seek game, wherethe intruder fighter tries to stay out of enemy’selectronic eyes (e.g. early warning radars, inter-ceptor sensors, AAA radars) as long as possible.The ability of the intruder to hide from thesehostile sensors will depend on tactics like low-level flight and reduction (or elimination) of

communication and radar emissions. The first isintended to use the terrain as a mask against theenemy’s radars, while the latter avoids being dis-covered by the enemy’s passive detectors.

However, flying into enemy territory means tobe vulnerable to a wide array of threats, and formost of them awareness is the first requisite toimprove the chances of surviving. To be aware,the pilot counts on the information provided byit’s own sensors, which can usually be groupedin two distinct types: passive and active. Sensorsin the first group detect all transmissions andclassify the respective sources; thus, they do notneed to make any transmissions by themselves.Sensors in the second group are those that trans-mit energy for a period of time and wait for areply in order to obtain information.

Although passive sensors are a stealthy way ofgathering information about the enemy, an obvi-ous drawback is the fact that their efficacy de-pends on whether the enemy is emitting. Inaddition, passive sensors like the radar homingand warning (RHAW, provides rough informa-tion on the bearing of the source of an electro-magnetic signal) do not provide a reliablemeasure of distance, at best providing a ‘line ofsight’ detection, at worst a ‘plane of sight’ for agiven azimuth angle and unknown altitude.

Active sensors, on the other hand, usuallyprovide more accurate measurement. As a conse-quence, the decision to decrease uncertainty ordetectability is an important trade off to bemade, mainly during a high attention-demandingsituation as a flight sortie. However, there areother issues on the use of active sensors. Amongthem is the management of the sensor’s power,that is, how to direct it (allocate it) for the manysurrounding enemy’s targets/aggressors. The pi-lot performs this allocation wisely, in order toachieve an optimal use of the aircraft’s weaponsystems (offensively and/or defensively). Here,the level of uncertainty will also guide the pilot’sdecisions.

Those decisions are not restrained to electronicwarfare considerations. The pilot also has todeal with navigation issues, complex aircraft sys-tems’ monitoring, damage control, fuel consump-tion; ultimately, he still has to pilot his aircraftin a 540 knots, near-the-ground flight. In addi-tion, modern aircraft and sophisticated defencesystems have dramatically increased the pilot’sworkload, particularly in the most critical phaseof the mission, the attack.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 12: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE254

These aspects more than justify the efforts thathave been made to reduce the number of decisionsa pilot has to make during flight. Systems arebeing engineered to function automatically mostof the time, while warning the pilot when there isa problem. Yet, sensor allocation is a task that isstill much more manual than automatic due to thegreat amount of uncertainty faced by the pilot.

The fighter pilot’s dilemma is to find an opti-mal set of decisions for each specific moment ofhis mission. As an example, a decision of notlaunching a missile against an approaching air-craft can be the optimal one for a given time andthe worst one just a few seconds latter, all depend-ing upon the level of uncertainty about that ap-proaching aircraft being an enemy. Even if anenemy, if that intruder is not a serious threat forthe aircraft, the pilot would prefer to just avoid itand save the missile (a scarce resource) for othersituations in the mission.

In short, the fighter pilot’s problem is in acontext where the decision parameters may varydrastically over time, and the decisions taken atmoment ‘t ’ have a definitive impact on theparameters for moment ‘t+1’.

7.2. The DDN solutionIn this sample implementation, the DDN struc-ture has only one influence diagram for the deci-sion sub-net, which is attached to a variable-sizeBN that comprises the data fusion sub-net. Figure8 gives an idea of this joint structure.

The four rectangular nodes of Figure 8 are thedecision sub-net’s action nodes (decision nodes).These nodes rely on the information brought bythe data fusion sub-net’s nodes and the knowl-edge and value structure contained in the otherdecision sub-net’s nodes for making the mostappropriate actions for each moment of themission.

In addition to the decision nodes, the decisionsub-net also carries the system’s value structure.The main objective is represented by the node‘accomplish mission’, which can be seen as thestrategic objective of the value structure, while thethree nodes that converge to the main objectivecomplete the fundamental objective structure thatguides the decision-making process (Keeney,1992). As the fundamental objective hierarchy in aDDN is explicitly stated, any modification in theactual problem can be easily transposed to the

Figure 8. The decision and data fusion sub-nets of the ‘wise pilot’ system.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 13: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 255

model, a feature that provides great level of flexi-bility to the system and is one of the approach’smajor strengths.

The size of the data fusion sub-net is a functionof the number of BN present in the inferencesub-net at a given moment of time, which is byitself a function of the number of tracks detectedby the aircraft’s sensors. Each track will have aseparate BN for making the appropriate infer-ences about the track, passing that information tothe data fusion sub-net’s nodes.

Data from the environment comes into theinference sub-net through the aircraft sensors(radar, IFF, RHAW etc.) and navigation devices(INS, altimeter etc.). This data is processed andused as parameters for inferring the characteristicsof each specific track. This process is based on theknowledge that was put inside each node of theBN, which is the result of many assessments madeby experts in each specific field.

As an example, the node ‘aircraft type’ willcontain a list of all aircraft that may be encoun-tered in a given theatre; the probability distribu-tion function over aircraft type will reflect thechances of meeting each different type of aircraftin that mission’s path. This information is as-sessed through an expert’s analysis based on vari-ous factors that are pertinent to that specific

theatre of operation. Thus, it is fair to say that thesum of the information contained in the nodes ofeach inference sub-net’s BN represents the bestknowledge available at a given time for situationassessment.

Indeed, this sample system has responded verywell in the controlled situations in which it hasbeen tested and confirmed the applicability of theDDN technique for this kind of problem. Figure9 shows an actual structure of the ‘wise pilot’system, here implemented through the Bayesianinference software Netica™, in a moment wherefive tracks were perceived by the aircraft sensors.

Figure 9 can be considered a ‘snapshot’ of thecurrent situation of the aircraft (at time ‘t ’), asreported by the available information from itssensors. Before going to its next iteration (time‘t+1’), the DDN will make some recommenda-tions for decisions regarding the best way of usingthe aircraft’s scarce resources for that situation. Inorder to make these recommendations, the DDNhas to perform a complex analysis on the manytrade-offs involved, like increasing the sensors’ability for detection against keeping a lower pro-file (decreasing the sensors’ capabilities) to avoidbeing detected by the enemy.

The DDN model is not only finding the maxi-mum value of all possible defined strategies, based

Figure 9. The ‘wise pilot’ system structure for five tracks.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 14: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE256

on the available information; it is also evaluatingthe value of information that might be achievedby every possible configuration of aircraft sensorsand weighing this value of information against thecost for implementing that configuration. This isan illustration of the dual control capability of theDDN structure, where both the expected utilityand value of information for each strategy areconsidered in each decision iteration. Ideally, thisdecision process could be formulated and solvedas a dynamic program, but this is clearly compu-tationally intractable.

Another point to be emphasized is the flexibil-ity presented by the structure. At time ‘t ’ we havefive perceived tracks, while at time ‘t+1’, we canhave the same five tracks plus two newly detectedones, or three tracks already detected in ‘t ’ plusone newly detected. Most of the current availableoptimization techniques do not provide a meansfor dealing with this varying number of variablesin the system.

7.3. The equivalent POMDP systemIn a very broad comparison, each DDN’s infer-ence sub-net can be modelled as a MDP, in otherwords, as an internal process of a POMDP. How-ever, each BN of the inference sub-net has a totalof 34 variables, some with more than ten possiblestates. Even if we assume that each variable hasonly four states, there would be a total of 434

possible combinations (or 295 147 905 179 352825 856 states in a MDP model).

Therefore, a simple translation from the DDNmodel to a POMDP in a process similar to theone we just did in the machine example wouldconfront us with a thoroughly intractable prob-lem. On top of that constraint, there is also theissue of variables with non-constant number ofstates, like those in the DDN model that aredependent on the number of available threats. APOMDP model, which requires static definitionsof the number of possible states, does not havethe same flexibility of the DDN approach, with itsvariable number of inference sub-nets (one foreach threat).

In essence, in order to model that situationthrough a POMDP, one would have to use notonly a different synthesis approach, probably cre-ating a new variable to account for the number ofthreats, but also would have to make many sim-plifying assumptions. Yet, both alternatives wouldrender the model either too naı̈ve or still in-

tractable. Even if we do not take the above obsta-cles in account, a POMDP model will also havesome major disadvantages over the DDN model,the lack of an explicitly stated fundamental objec-tive structure being an important example.

As we already commented when analysing theDDN approach, the decision sub-net carries theproblem’s fundamental objective structure, whichprovides guidance to the decision-making process.In contrast, the value structure of a POMDPmodel is implicit in the definition of its states andtransitions, which makes modifications in theproblem’s value structure rendering the modelinappropriate at best. We will revisit this issue insubsection 9.1.

8. COMPARISON OF APPROACHES III:CONCEPTUAL ISSUES

In addition to the lack of feasible modelling solu-tions in the POMDP approach, there are alsoconceptual issues that render the DDN approachmore appropriate for those types of problems thatsatisfy the DDN assumptions. Indeed, some ofthe advantages mentioned in the DDN system’sexplanation just cannot be completely replicatedby the POMDP approach, mostly because it doesnot provide the same flexibility and features of theDDN scheme. The following paragraphs addressthese issues.

8.1. The impreciseness of sensorsAs we already know, in order to analyse thecurrent situation the fighter pilot relies on theaircraft’s various sensors, which will give himinformation on the various parameters relatedwith the possible threats to his aircraft. However,sensor information itself is not perfect, and thepilot has to be aware of that impreciseness whendeciding what action to perform. Furthermore,different sensors of an aircraft have distinct levelsof reliability, which is also an issue to be consid-ered in the decision process.

Likewise, an automated decision system has toaccount for this sensor impreciseness before esti-mating the actual situation. In the DDN ‘wisepilot’ system, before being used for inference pur-poses, the information coming from a sensorpasses through a special node called confusionmatrix (Buede and Girardi, 1997), which has the

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 15: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 257

same states as the sensor’s input node. The prob-ability distribution between the input node andthe confusion matrix node will reflect the(im)preciseness of that sensor.

However, for the POMDP model, many stateswill have the same setting for a specific signatureaddressed by an aircraft sensor. This complexrelationship between POMDP states and sensorcapabilities must be considered in crafting sensorreports about the POMDP states.

In addition, computing the synergy betweendifferent sensors through the internal transitionprobabilities between states poses a new limitationto the POMDP approach. Should any sensor bereplaced or upgraded (i.e. for a more precise one),in a DDN system this will be a matter of readjust-ing that sensor’s respective confusion matrices.While in a POMDP system, many changes wouldhave to be made to mostly all of the internaltransition probability matrices.

This can be thought as a consequence of thedifferences in the way each scheme works throughthe inference process. The DDN’s nodes are actu-ally variables of the system; and most of the beliefupdating is a local process (with relation to thevariables). In contrast, in an internal MDP, eachstate is a combination of the many variables, so atransition probability matrix between two statesconveys information about all the variables of thesystem (not a local process).

One possible way of overcoming this problemwould be to extract the sensor’s imprecisenessfrom the internal MDP and resolve it in the pvector updating process. However, in order toaccount for the sensors’ capabilities and limita-tions, which are also influenced by factors insidethe MDP process (like aircraft’s current position),a BN system would have to be developed for thep vector updating. Such a hybrid system is morelikely to add the shortcomings of both techniquesthan their advantages, as some variables wouldprobably have to be duplicated and the addedcomplexity would only make the solution morecomputationally complex.

8.2. Number of tracksWe have already covered this issue, but there aresome interesting points that should be raised asboth techniques present limitations in this aspect.The POMDP’s lack of effective algorithms forsolving highly complex problems was already

commented, so we will analyse the DDN’s restric-tions on this issue.

Even though the DDN is theoretically un-bounded with respect to the number of threats,practical considerations on both system memoryand performance dictate that there should be alimit on the number of threats to be considered bythe system. As we have noted in the DDN systempresentation, for each track there will be a sepa-rate BN in the inference sub-net that consumesmemory and processing power and that also in-creases the complexity of the merging process inthe data fusion sub-net.

Nevertheless, it should be noted that currenttechnology allows this limit to be high enoughthat functional systems can be made. In addition,the fast pace at which the processing power andmemory technology is growing tends to rendersuch a system practically unbounded in thisaspect.

The POMDP does not support a variable num-ber of threats, since each state in the internalMDP process has to be defined prior to pro-gramme execution; in other words, we can notsimply ‘add an enemy threat’ to the internal sys-tem as it is detected by the aircraft’s sensors.

There might be different solutions for this con-straint, like fixing a certain number of tracks andfilling the state parameters of the non-existingones with ‘dummy’ data, or creating a new vari-able to track the number of threats and activatedifferent internal MDP’s as needed. Yet, no mat-ter how inventive we are, these would be onlyadaptations of a model that would only add com-plexity with respect to the original. In this case,the best alternative is to abandon the ‘variablenumber of tracks’ model employed in the DDNsystem, and go to a completely different ap-proach. However, such an effort would be costeffective only after the other huge limitations ofthe POMDP approaches were solved.

8.3. Long horizon and the feasibility ofapproachesAs we know from dynamic programming, the bestalternative in the short term is not necessarily thebest for the long term. Indeed, we could see thatthe best decision policy in the manufacturing ma-chine would depend on how far we go into thefuture. This basic concept also holds for the morecomplicated multi-sensor data fusion systemproblem, where the decision on whether to turn

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 16: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE258

the IFF on or leave it off should be made takinginto account not only its immediate rewards (i.e.determining whether a given track is a friend or afoe), but also the future implications.

Apart from its convenient resemblance withmany real life situations, one of the most appeal-ing reasons that keeps POMDPs as one of themain topics in scientific research today is its ca-pacity of choosing an optimal option by a directcalculation of its future reward. In other words, aPOMDP system will exhaustively visit all possiblepaths of states in a given horizon.

DDNs employ a different approach, as not allpossible ‘states of the world’ are visited. With astructure that takes account of the probabilisticinterrelationships among variables, systems likethe ‘wise pilot’ rely on expert information to makeinferences based upon the available data to calcu-late the best alternative. Nonetheless, it is impor-tant to note a subtler issue addressed by DDN,namely the value of collecting information so asto resolve key uncertainties for the decisions inthe future. This aspect of the DDN approachaddresses the trade-offs between the immediaterewards of a given action and its respective futureimplications.

Instead of establishing a given number of itera-tions and computing the optimized policy for thathorizon, as it was done in the influence diagramsystems of Figures 6 and 7, the DDN uses thedual-control approach for computing both thebest actions and information collection activitiesavailable. This alternative scheme avoids theproblems found in these simple problems, wherethe influence diagrams, though fairly resistant toincreases in the number of variables, proved to beless than practical for a longer horizon (morestages).

The main idea of the concept of dual-controlemployed here is to use expert information as abasis for inference about both the near-term andfar-term. Indeed, a DDN addresses a limited setof dynamic decisions in which one alternativefrom a specified set needs to be selected and thebig issue is when to make the selection. In the‘wise pilot’ system, the short-term implications arerelated to the immediate gains of each action, likefiring a missile, while the longer horizon concernsdeal with what kind of information is most impor-tant if the selection is delayed as well as what willbe the ramifications of collecting that informationwith an active sensor.

9. COMPARISON OF APPROACHES IV:MULTI-CRITERIA ASPECTS

Even though this characteristic is not always eas-ily perceivable, problems inside the ‘DDN zone’involve decisions that cannot be taken with ele-mentary utility functions. In other words, thecomplexity of the situation is also reflected in themultiple attributes or objectives that guide thedecision process. Indeed, most situations insidethe ‘DDN zone’ will require complex valuestructures.

Ultimately, both techniques analysed here offera means of dealing with multi-criteria decisionmaking. However, the methodologies employedby each approach have totally different character-istics, and it is based in these particular differ-ences that we claim the DDN to be moreapplicable to multi-criteria decision problems.

9.1. Multi-criteria in the DDN approachIn a DDN model, the value structure of a prob-lem is represented in the decision sub-net througha direct translation from the original hierarchy ofattributes to nodes in the decision sub-net. Eachnode represents a given attribute/objective. Allnodes that contain value objectives collapse intoan ultimate value node, which contains the utilityvector of the DDN. This vector is a normalizationof the various attributes’ weights.

In the case study previously shown, the fighterpilot’s fundamental objective hierarchy containsthree secondary nodes, each carrying its ownmapping from the act space to the n-dimensionalconsequence space (Keeney and Raiffa, 1976).The decisions available to the pilot will havedifferent, usually opposite consequences for eachobjective, so the need of a multi-attribute valuefunction for dealing with the trade-offs involvedin the decision process is clear.

Figure 10 shows the insertion of the fundamen-tal objective hierarchy into the decision sub-net ofthe ‘wise pilot’ DDN system. Looking at thatpicture, one can easily realize that the intercon-nections between the decision nodes, the valuenode and all other nodes inside the decision sub-net provide a clear mapping of the many sub-tleties of the decision-making process in thesystem. In other words, the technique’s graphicalnature provides a means of understanding howeach objective is inserted in the context of thesystem, its significance to each decision, which

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 17: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 259

events (represented by chance nodes) affect thatobjective, and many other aspects that are rele-vant to the decision process itself.

In fact, even a person with limited or no back-ground in this domain may understand the basicissues governing the decision-making process in-side the system. As an example, looking at Figure10, one needs no domain background to realizethat changing the radar mode (i.e. modifying thestate of ‘radar action’ decision node) will have anindirect impact on two objectives, ‘avoid fratri-cide’ and ‘maximize survivability’.

Yet, apart from the easiness in communicatingthe model’s subtitles, the explicit way in which theDDN approach depicts the problem’s value struc-ture has other important advantages. One advan-tage is the ability of changing the multi-attributevalue function to reflect changes among the deci-sion parameters. Indeed, this function reflects theweight each attribute has in the overall decisionprocess, a direct outcome of the criteria used toevaluate these attributes; changes in the evalua-tion criteria will have immediate consequences tothe multi-attribute value function.

In the example of the fighter pilot’s problem,the relative importance of the objectives will bedictated by a unique combination of political andoperational issues that are characteristic of agiven conflict. Distinct conflicts would certainlyhave different characteristics and concerns thatpoint to different interrelationships among thefundamental objectives. Adjusting a DDN modelto reflect those differences is a matter of changingthe weights that are explicitly stated in the ‘ac-complish mission’ node.

Obviously, such a feature can also be used forperforming sensitivity analysis, as changes to themulti-attribute function may be easily replicatedin the system and the consequences of suchchanges immediately assessed. Indeed, in the ‘wisepilot’ DDN system, a sensitivity analysis can beperformed in order to make the system suitablefor scenarios that have different political and/ordoctrinal characteristics. In other words, the sys-tem is able to reflect the changes in the wayairborne operations are conducted, a very desir-able feature for a decision process with multiple,volatile criteria to follow.

Figure 10. The fighter pilot problem’s objective hierarchy inside the DDN.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 18: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE260

9.2. Multi-criteria in the POMDP approachAs stated before, the POMDP approach can alsoreplicate a multi-criteria decision environment,since the decisions are explicitly modelled. How-ever, as it could be noted from Section 4, thevalue structure is not replicated in an explicit wayas in the DDN case. Actually, the value structureis transformed into rewards that are associatedwith each state transition or state location.

Such a scheme does not allow direct modifica-tions to the model in order to reflect changes inthe evaluation criteria. Actually, modifying thevalue structure in forms of weights or specificobjectives in such a scheme would require a com-plete reformulation of the reward function for alltransitions and states.

In other words, attempting to perform a sensi-tivity analysis, or even to incorporate a singularchange in the evaluation criteria would require anextensive reengineering effort to the whole systemsince its value structure is implicit in every of itsstates and transitions.

As a consequence, even though the approachallows the modelling of a multi-criteria decisionproblem, the parameters related with the weightand relevance of each attribute will be spreadamong the system’s variables. Apart from thetechnique’s shortcomings that we have alreadycovered, and from its intractability for complexproblems, this limitation is the definitive reasonfor not using it in multi-criteria decisionproblems.

10. EPILOGUE

Both POMDP and DDN are analytical modelsaimed at solving complex, real-life problems thatrely on the postulates of multi-attribute utilitytheory and probability theory.

The POMDP model assumes the situation as aninternal, not directly observable Markov decisionprocess, and models it by exhaustively listing allpossible states in which the system can be, storingthe probabilities of each state transition and itsrespective rewards. In addition, partial observa-tions of this internal model are used for inferringthe current state of the system, which is proba-bilistically updated for each new piece ofinformation.

A dynamic decision network is a graphical tech-nique that models a situation by establishing the

interrelationships between its many variables, re-lying on Bayesian-based algorithms for updatingthe current knowledge of the system. In addition,a value structure is explicitly stated through spe-cial nodes, where each possible combination hasits expected utility calculated and stored for use indecision selection. However, the DDN approachis limited for use by its assumptions.

The POMDP technique, although providing anelegant probabilistic scheme and theoretical sup-port, still has a long way to go before achievingthe same practical results experienced today byDDN structures. Indeed, the current state of tech-nology prevents the technique from achieving anytractable model, apart for the simplest situations,while DDN systems can present feasible solutionsfor complex systems with a similar powerfulbackground.

Among the issues presented when comparingthe two approaches, the suitability for solvingmulti-criteria decision problems was analysedwith no regard to the POMDP’s lack of capablealgorithms, and the DDN’s explicit modelling ofmulti-attribute functions was presented as an in-valuable asset. Therefore, we claim that even inthe case of a breakthrough in the algorithms forsolving POMDPs, the DDN technique is the pre-ferred technique when its assumptions arejustified.

REFERENCES

Astrom AJ. 1965. Optimal control of Markov decisionprocesses with incomplete state information. Journalof Mathematical Analysis and Applications 10: 174–205.

Bellman R. 1957. Applied Dynamic Programming.Princeton University Press: Princeton, NJ.

Bertsekas DP. 1995. Dynamic Programming and Opti-mal Control. Athena Scientific: Belmont, MA.

Booker LB, Hota N. 1988. Probabilistic reasoningabout ship images. Uncertainty in Artificial Intelli-gence 2: 371–379.

Boutilier C, Poole D. 1996. Computing optimal policiesfor partially observable decision processes using com-pact representations. In Proceedings of the ThirteenthNational Conference on Artificial Intelligence. AAAIPress/MIT Press; 1168–1175.

Buede DM. 1999. Dynamic decision networks: an ap-proach for solving the dual control problem. Paperpresented at Spring INFORMS, Cincinnati, OH.

Buede DM, Costa PCG. 1999. A graph theoreticarchitecture for dual control: decision making in

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 19: Dynamic decision making: a comparison of approaches

DYNAMIC DECISION ANALYSIS 261

multisensor systems. Paper presented at Military Op-eration Research Society Symposium, West Point, NY.

Buede DM, Ferrel DO. 1993. Convergence in problemsolving: a prelude to quantitative analysis. IEEETransactions on Systems, Man, and Cybernetics 23:746–765.

Buede DM, Girardi P. 1997. A target identificationcomparison of Bayesian and Dempster-Shafer multi-sensor fusion. IEEE Transactions on Systems, Man,and Cybernetics 27: 569–577.

Cassandra AR, Kaelbling LP, Litmann ML. 1994. Act-ing optimically in partially observable stochastic do-mains. In Proceedings of the Twelfth NationalConference on Artificial Intelligence ; 1023–1028.

Cassandra AR, Littman ML, Zhang NL. 1997. Incre-mental pruning: a simple, fast, exact method forpartially observable Markov decision processes. InProceedings of Thirteenth Conference on Uncertaintyin Artificial Intelligence. AAAI Press/MIT Press; 54–61.

Charniak E. 1991. Bayesian networks without fears. AIMagazine 12(4): 50–63.

Chrisman L. 1992. Reinforcement learning with percep-tual aliasing: the perceptual distinctions approach. InProceedings of the Tenth National Conference on Arti-ficial Intelligence. AAAI Press: San Jose, CA; 183–188.

Costa PCG. 1999. The fighter aircraft’s autodefensemanagement problem: a dynamic decision networkapproach. Master thesis, Center for Excellence in C3I,IT&E, George Mason University.

French S. 1986. Decision Theory: An Introduction to theMathematics of Rationality. John Wiley & Sons:Chichester, UK.

Hansen EA. 1997. An improved polity iteration al-gorithm for partially observable MDPs. In Ad6ancesin Neural Information Processing Systems, vol. 10.MIT Press: Cambridge, MA.

Hansen EA. 1998a. Finite-memory control of partiallyobservable systems. PhD thesis, Department of Com-puter Science, University of Massachusetts atAmherst.

Hansen EA. 1998b. Solving POMDPs by Searching inPolicy Space. In Proceedings of the Eighth Conferenceon Uncertainty in Artificial Intelligence. Madison, WI;211–219.

Hansom O, Mayer A. 1989. Heuristic search as eviden-tial reasoning. In Proceedings of the Fifth Workshopon Uncertainty in Artificial Intelligence. MountainView, CA.

Hilton RW. 1981. The determinants of informationvalue: synthesizing some general results. ManagementScience 27(1): 57–64.

Howard RA. 1960. Dynamic Programming and Marko6Processes. MIT Press: Cambridge, MA.

Howard RA, Matheson JE (eds). 1981. Influence dia-grams. In The Principles and Applications of Decision

Analysis, vol. II. Strategic Decisions Group: MenloPark, CA; 719–762.

Howard RA. 1971. Dynamic Probabilistic Systems, vol.I. John Wiley & Sons: New York.

Howard RA. 1998. Transcript of ‘Panel discussion:downstream decisions (options) and dynamic model-ing, at 1998 Seattle Informs Meeting, Session SE01—Sunday, October 25th, 16:30–18:00, Seattle, WA.

Jensen FV. 1996. An Introduction to Bayesian Networks.Springer: New York, NY.

Keeney RL, Raiffa H. 1976. Decisions with MultipleObjecti6es. John Wiley & Sons: New York, NY.

Keeney RL. 1992. Value-Focused Thinking: A Path toCreati6e Decisionmaking. Harvard University Press:Cambridge, MA.

Keynes JM. 1957. A Treatise on Probability. Macmillan:London, UK (originally published in 1921).

La Valle I. 1968a. On cash equivalents and informationevaluation in decisions under uncertainty: Part I.basic theory. Journal of the American Statistical Asso-ciation 63: 252–276.

La Valle I. 1968b. On cash equivalents and informationevaluation in decisions under uncertainty: Part II.Incremental information decisions. Journal of theAmerican Statistical Association 63: 277–284.

La Valle I. 1968c. On cash equivalents and informationevaluation in decisions under uncertainty: Part III.Exchanging partition-J for partition-K information.Journal of the American Statistical Association 63:285–290.

Laplace PS. 1951. A Philosophical Essay and Probabili-ties. Dover: New York, NY (originally published in1820).

Matheson JE. 1990. Using influence diagrams to valueinformation and control. In Influence Diagrams, Be-lief Nets, and Decision Analysis, Oliver RM, Smith JQ(eds). John Wiley & Sons: Chichester, UK; 25–48.

Merkhofer ML. 1988. Using influence diagrams in mul-tiattribute utility analysis—improving effectivenessthrough improving communication. In Proceedings ofthe Conference on the Influence Diagrams for DecisionAnalysis, Inference and Prediction.

Miller AC, Merkhofer MM, Howard RA, MathesonJE, Rice TR. 1976. De6elopment of Automated Aidsfor Decision Analysis. Stanford Research Institute:Menlo Park, CA.

Monahan GE. 1982. A survey of partially observableMarkov decision processes: theory, models, and al-gorithms. Management Science 28(1): 1–16.

Neapolitan RE. 1990. Probabilistic Reasoning in ExpertSystems: Theory and Algorithms. John Wiley & Sons:New York, NY.

Oliver RM, Smith JQ. 1990. Influence Diagrams, BeliefNets, and Decision Analysis. John Wiley & Sons:Chichester, UK.

Olmsted SM. 1983. On representing and solving deci-sion problems. PhD thesis, EES Department, Stan-ford University, Stanford, CA.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)

Page 20: Dynamic decision making: a comparison of approaches

P.C.G. DA COSTA AND D.M. BUEDE262

Pearl J. 1988. Probabilistic Reasoning in Expert Sys-tems: Networks of Plausible Inference. Morgan Kauf-mann Publishers: San Francisco, CA.

Putterman ML. 1990. Markov decision processes. InHandbooks in OR & MS, vol. 2, Heyman DP, SobelMJ (eds). Elsevier Science Publishers: Amsterdam;331–434.

Putterman ML. 1994. Marko6 Decision Processes: Dis-crete Stochastic Dynamic Programming. John Wiley& Sons: New York.

Savage LJ. 1954. Foundations of Statistics. John Wiley& Sons: New York.

Savage LJ. 1961. The foundations of statistics reconsid-ered. In Proceedings of the Fourth Berkeley Sympo-sium on Mathematics and Probability, reprinted inReadings in Uncertain Reasoning, Shafer, Pearl (eds).Morgan Kaufmann: Palo Alto, CA.

Sawaki K, Ichikawa A. 1978. Optimal control for par-tially observable Markov decision processes over aninfinite horizon. Journal of the Operations ResearchSociety of Japan 21(1): 1–14.

Shachter RD. 1986. Evaluating influence diagrams. Op-erations Research 34: 871–882.

Shachter RD. 1988. Probabilistic inference and influ-ence diagrams. Operations Research 36: 589–604.

Shachter RD. 1990. An ordered examination of influ-ence diagrams. Networks 20: 535–563.

Smallwood RD, Sondik EJ. 1973. The optimal controlof partially observable Markov decision processesover a finite horizon. Operations Research 21: 1071–1088.

Sondik EJ. 1971. The optimal control of partially ob-servable Markov decision processes. PhD disserta-tion, Stanford University, Stanford, CA.

Sondik EJ. 1978. The optimal control of partially ob-servable Markov decision processes over the infinitehorizon. Operations Research 26.

Spiegelhalter DJ, Franklin R, Bull K. 1989. Assessmentcriticism and improvement of imprecise probabilitiesfor a medical expert system. In Proceedings of theFifth Workshop on Uncertainty in Artificial Intelli-gence. Mountain View, CA; 335–342.

Tatman JA, Shachter RD. 1990. Dynamic program-ming and influence diagrams. IEEE Transactions onSystems, Man, and Cybernetics 20.

Von Newmann J, Morgenstern O. 1944. The Theory ofGames and Economic Beha6ior. Princeton UniversityPress: Princeton, NJ.

Watson SR, Buede DM. 1987. Decision Synthesis.Cambridge University Press: New York.

Weatherford R. 1982. Philosophical Foundations ofProbability Theory. Routledge & Kegan Paul:London.

Welman MP, Heckerman D. 1995. Real-world applica-tions of Bayesian networks. Communications of theACM 8: 24–30.

White III CC, Scherer WT. 1989. Solution proceduresfor partially observable Markov decision processes.Operations Research 37: 791–797.

White DJ. 1992. Marko6 Decision Processes. JohnWiley & Sons: New York.

Zang NL, Liu W. 1996. Planning in stochastic domains:problem characteristics and approximation. Techni-cal Report HKUST-CS96-31, Department of Com-puter Science, Hong Kong University of Science andTechnology.

Ziemba WT, Butterworth JE. 1975. Bounds on thevalue of information in uncertain decision problems.Stochastics 1: 361–378.

Copyright © 2000 John Wiley & Sons, Ltd. J. Multi-Crit. Decis. Anal. 9: 243–262 (2000)