probabilistic heuristic estimates

Annals of Mathematics and Artificial Intelligence, 2 (1990) 209-220 209

PROBABILISTIC HEURISTIC ESTIMATES *

O t h a r H A N S S O N and A n d r e w M A Y E R

Computer Science Division, University of California, Berkeley, CA 94720, USA

Abstract

Though they constitute the major knowledge source in problem-solving systems, no unified theory of heuristics has emerged. Pearl [15] defines heuristics as "criteria, methods, or principles for deciding which among several alternative courses of action promises to be the most effective in order to achieve some goal". The absence of a more precise definition has impeded our efforts to understand, utilize, and discover heuristics. Another consequence is that problem-solving techniques which rely on heuristic knowledge cannot be relied upon to act rationally - in the sense of the normative theory of rationality.

To provide a sound basis for BPS, the Bayesian Problem-Solver, we have developed a simple formal theory of heuristics, which is general enough to subsume traditional heuristic functions as well as other forms of problem-solving knowledge, and to straddle disparate problem domains. Probabilistic heuristic estimates represent a probabilistic association of sensations with prior experience - specifically, a mapping from observations directly to subjective probabilities which enables the use of theoretically principled mechanisms for coherent inference and decision making during problem-solving. This paper discusses some of the implications of this theory, and describes its successful application in BPS.

Keywords: Problem-solving, state-space, heuristic search, heuristic evaluation function, Baye- sian probability, decision theory, utility theory, learning.

1. A model of Bayesian problem-solving

W e begin b y carefu l ly reassess ing the bas ic p r o b l e m - s o l v i n g p roces s to ga in an u n d e r s t a n d i n g of the key issues, and ins ight in to h o w to add res s them. T h e

a p p r o a c h will be one of p r o b l e m - r e d u c t i o n - to recurs ive ly d e c o m p o s e the t ask of a p rob l em-so lv ing agent in to dis t inct s u b p r o b l e m s - unt i l all the s u b p r o b l e m s have been solved. This t ask analysis will t hen d i rec t the des ign of o u r p r o b l e m - solving system, BPS.

1.1. STATE-SPACE PROBLEM-SOLVING

T h e s ta te - space a p p r o a c h to p r o b l e m - s o l v i n g [14] cons ide rs a p r o b l e m P as a

quadrup le , ( S = { S 0, $1, . . . }, O c S x S, I ~ S, G c S) . S is the set o f poss ib le

* This research was made possible by support from Heuristicrats, the National Aeronautics and Space Administration, and the Rand Corporation.

�9 J.C. Baltzer A.G. Scientific Publishing Company

210 O. Hansson, A. Mayer / Probabilistic heuristic estimates

states of the world, O is the set of operators, or transitions between states, I is the one initial state, and G is the set of goal states. In theory, any problem can be represented as a state-space graph, where the states are nodes, and the operators are directed, weighted arcs between nodes. The problem is said to be solved when a sequence of operators, the solution-path, has been applied to state I, yielding a state in G.

From an agent's perspective, the state-space is a tree rooted at the current state - a decision-tree. As a solution-path is simply a sequence of individual operators, solving P reduces to a sequential decision process - at any point, the agent must decide which of the adjacent states is the most advantageous to move to. We will refer to each such adjacent state as an alternative, A,.

1.2. DECISIONS UNDER CERTAINTY

In order to distinguish among alternatives, an agent needs to determine the eventual consequences, or outcome of each. This is because the desirability of an alternative is defined in terms of the outcome to which it leads - for example, a queen capture in chess is usually a preferred move because it is expected to lead to a win. With the outcome of each alternative determined, the agent can select the alternative with the highest quality outcome. For clarity, let I2 = { o~1, ~o2,..., ~o,, ) denote the set of possible outcomes for P. In simple navigation problems an agent's measure of solution quality may depend only on the length of the solution-path, and thus I2 consists of different path-lengths. In chess, the set of outcomes 12 = (win, loss, draw}.

1.3. DECISIONS UNDER UNCERTAINTY

In theory, each state in the state-space has an outcome associated with it. In any domain, an agent could make optimal moves if he knew, with certainty, the outcome of each alternative. In practice, however, an agent has no means for determining this association, and will be uncertain of the precise outcome of each A i (therefore, let O i be a variable indicating the outcome of Ai). Fortunately, a formal approach to solving such decision problems exists.

Utifity theory captures the preferences of an agent faced with uncertainty - utility being the subjective assignment of value to potential outcomes, when the exact outcome is unknown [17,21,22]. The field of Decision Analysis explains how to devise such a utility function, U(%), which models an agent's assignment of utilities to different outcomes. The theory dictates that a rational agent must choose that A i which maximizes his expected utility, E U (A,) = Ej U(o~j) P ( O, = o~j ), where P ( O i = %) is the probability that the outcome of A~ is %. For notational convenience, we will denote the vector of probabilities ( P ( O i = o~1), P(Oi= o~2),..., P(Oi = o~n)) by P(O,).

O. Hansson, A. Mayer / Probabilistic heuristic estimates

2. Probabilistic heuristic estimates

211

Thus, a formal framework exists for rational choice where P(O,) can be determined for each A,. To do so, our problem-solving process becomes dependent upon a particular problem domain. We require a tool which can examine an alternative, At, and arrive at an estimate of P(O~). The only available state evaluator is a heuristic function, h, a discriminant funct ion used to classify states based on their visible features.

Fundamental ly , heuristic functions act as sensors by which an agent may measure his environment. An agent is analogous to a bl ind man, who comes to know the world through the sense of touch, perceiving another 's face f rom the sensations t ransmit ted by his fingertips. This example illustrates an impor tan t point, that perceptions need bear little resemblance to the raw sensations which are experienced [3]. Perceptions are sensations, interpreted. The heuristic functions, described above, fulfill only a sensory role. Unders tand ing a problem domain requires both sensation and perception, and hence the probabilistic heuristic estimate, or percept ion of a state, is also required.

A probabilistic heuristic estimate (PHE) is a means of convert ing h (A~) into an estimate of P(O~) - informally, the heuristic estimate calibrates the sensor. Formally, we say a probabilistic heuristic estimate is a tuple; (h(S~), P(O~ I h(S,))). The "sensor" h is a function which induces equivalence classes in the state-space. The "perceptor" P(O, Ih(Si) ) (or unambiguously P(O, I h)) is a condit ional probabili ty distribution of outcomes given the heuristic value. P(O, I h) represents the degree of belief assigned to the possible outcomes, given h as evidence. An example PH E is depicted in fig. 1, which shows the probabil i ty dis tr ibut ion over Manha t tan Distance heuristic values and Eight Puzzle outcomes (i.e., shortest-path lengths) - P(O, I h(S,)) is a slice of this distr ibution (a fixed x value). This P H E was used in our empirical tests (section 6).

Manhattan Distance

Fig. 1. Probability distribution over Manhattan Distance (x) and solution lengths (y).


Implications. Assuming that we have such a PHE, we have satisfied the formal requirements of the decision problem and can attempt to solve P by repeating the following algorithm:

DECIDE (A1.. . An) - Evaluate each alternative, Ai, using h(Ai). - Assess probabilities using P(O, Ih(Ai) ). - Compute expected utility of each alternative, U(Oi). P(Oi). - Move to the alternative with maximum expected utility.

3. I n f e r e n c e f r o m P H E s

The DECIDE algorithm (section 2) suffers from its dependence on the discriminatory ability of the PHE. In the best case, the PHE would correctly identify the actual outcome, causing certainty in the P(O,), and enabling optimal move- ment. The poorer the PHE discriminates, the less accurate the P(O,), and therefore, the less accurate the agent's calculation of the expected utility, EU(A,). Because the agent will base his decisions solely on the EU(A,), he should endeavor to make them as accurate as possible. We will assume that we cannot do so by increasing the discriminatory ability of the PHE, for that would require altering h, which we have assumed to be external to the system.

Fortunately, we may compensate by taking advantage of the strong dependence between adjacent states in a decision-tree. For example, in single-agent navigation domains, outcomes of adjacent nodes can differ by no more than the cost of the operator connecting them. Assuming unit operator cost, if an agent knew that a state S~ had equiprobable outcomes (here, solution-path lengths) x and x + 1, and learned that the actual outcome of an adjacent state Sa was x + 2, he could infer that the actual outcome of S,~ is x + 1. This is merely a special case of the probabilistic inference which could be made in the general case. For example, on learning that S B was "very likely" to be x + 2, the agent would increase his belief that the actual outcome of S~ was x + 1 and correspondingly decrease his belief in x.

3.1. SEARCH

Agents can exploit these constraints during problem-solving. If an agent doubts the accuracy of the P(O,) he has computed, he may extend the DECIDE algorithm and evaluate additional states which are adjacent to his immediate alternatives. Information derived from these states may further constrain the P(Oi) , making them more accurate, therefore making the EU(A,) more accurate, and ultimately enabling a more informed decision.

Exploring yet more distant levels of the decision-tree may also be profitable. The benefit of additional exploration can be fully expressed in terms of condi-

O. Hansson, A. Mayer / Probabilistic heuristic estimates 213

tional probabilities. After evaluating an alternative Ai, the belief in different outcomes is precisely P(Oilh(Ai) ). Exploring further, and evaluating n additional states { Sel, . . . , S~ ) in the subtree below A,, changes an agent 's belief to P( O, l h( a,), h( Se),. . . , h( S~,)).

This process of exploring a decision-tree is known as search, in which the atomic action is the state expansion. Expanding a state consists of evaluating that state and generating its successors, which are then available for subsequent expansion. * The states available for expansion will be called leaves of the search tree.

It might seem that an agent should search the decision-tree exhaustively, because condit ioning on all available informat ion would yield the greatest possible accuracy in the P(Oi). Unfortunately, agents are constrained to operate under finite resource restrictions, and search is a resource draining process (e.g., t ime and memory). Excessive search will typically diminish the multiattribute utility of the solutions ult imately found, if t ime is an attr ibute [8].

Thus, due to resource limitations which are reflected in his utility function, an agent will be compelled to limit his search to a small por t ion of the entire decision-tree. Two subtasks then emerge - choosing the best search tree f rom which to draw inference, and drawing that inference. We consider elsewhere the problem of choosing the best search tree, or rather, growing the tree selectively [7], and discuss here only the inference problem.

3.2. SEARCH TREES AS BAYESIAN NETWORKS

A solution to the easily-stated inference problem - determining, f rom the heuristic evidence in a partial search tree, P(Oi[all evidence) for each A , - is not immediate. A solution is found by exploiting the close resemblance of the search tree to a more abstract representational model, the Bayesian belief network [16]. We briefly describe how this model can be adapted for our purposes, but refer the reader to [7] for a more thorough presentation.

A fragment of a typical search tree for a one-player game is shown in fig. 2, where the current state is So. With each node is associated a variable, and a current belief vector, reflecting belief in different values for the variable - the belief corresponds to the probabili ty of that value, given the evidence in the tree.

The variable O, associated with a state S i of the problem is the ou tcome of that state. The variable h i associated with each heuristic node is the value re turned by the heuristic function - after we evaluate the corresponding state S,, the value of this variable is fixed (at h(Si) ), but before doing so, we have an ant icipatory belief in what the heuristic will report. Finally, on each arc in the graph is a matrix, expressing the condit ional probabil i ty P(chi ld [parent) for all values of the variables at each end of the arc.

* Our definition differs from that found in the literature.


Fig. 2. Search tree fragment.

The figure suggests that we can use the chain rule to simplify the joint probability distribution over all the variables into (1-[iP(i [parent(i)))P(root), by making the conditional independence assumption that the parent of a node i renders i independent of all other non-descendants of i (there is a simple transformation that renders the heuristic values independent). It is a simple matter to then compute P(O, ]all evidence) from this chain rule decomposition [7].

3.3. OVERVIEW: SEARCH IN BPS

Within a search tree representation, making a rational move requires an agent to explore the tree by selectively expanding relevant nodes - the ordering of heuristic evaluations and selective growth of the tree is beyond the scope of this paper, but is greatly facilitated by the explicit probabilistic interpretation of heuristic estimates. The heuristic evaluation of each new node provides evidence, causing the agent to update his beliefs about the relative desirability of his immediately available alternatives. With each heuristic evaluation we also expand the search tree, making more states available for subsequent evaluation. This process continues until the agent believes that committing to a move is preferable to further deliberation. At this point the move is made.

4. Representation of PHEs

The PHE provides a simple means of infusing domain specific information into the problem-solving process by associating immediately visible features of a state


with a belief about the outcome of that state. "Features" of the state are indicated by a heuristic function, h, and the association provided by the heuristic estimate P( Oi [ h ).

One base-level representation for the PHE is a histogram, n, recording outcomes and associated heuristic values. This two-dimensional matrix is trivially updated as data are observed, providing the simplest form of empirical learning. Storing data in this manner is not meant to relegate probability to a frequency- ratio interpretation, rather it is merely a convenient method for explicitly and impartially compiling observed experience. The representation also simplifies the task of sharing experience among agents.

The interpretation of this base-level representation occurs at a higher-level, and provides us with P(O, Ih). There are a number of possible mechanisms for interpreting these data. In the absence of information to significantly constrain an interpretation of the raw numerical values, one may use the maximum entropy prior probability that is consistent with the available information [12]. For example, prior to any experience, we would like the conditional probability to reflect our uncertainty, thus P(O, lh) should be uniformly 1/K. After gaining experience, reflected by n, this prior belief should be updated to the posterior, P(O, lh, n).

Assuming this uniform prior, assessing P(Oilh, n) is precisely the much- studied multinomial estimation problem [6], for which we can use, e.g.,

e(O, Ih, n ) = n(O~, h ) + l N + K '

where n (Oi, h) is the number of occurrences of outcome O, and heuristic value h, N is the number of observations in the hth column of the matrix, and K is the number of rows in the matrix.

If we have more information (or are willing to make more assumptions) and believe that adjacent entries in the histogram are dependent, we can use the data in the histogram to build a high-level representation. For example, we may wish to assume that P(Oiih) is adequately represented by a member of a parameterized family of curves (e.g., Normal distributions, Legendre polynomials). The probability that each curve describes the data, P(ffl n) (where ~ is a set of parameter values, and n is the histogram data) can then be calculated by the use of Bayes' rule, as P(n I~) is given by the curve (of course, we still need a prior P(q~)). In general, we have a set of models (each specified by parameter values q~), each providing predictions P(O~lh, r which We weight by our belief in each model P ( ~ l n ) [2,4].

There are also other methods for learning the association P(O, Ih). For example, as heuristic evaluation functions can often provide an estimate of the distance between any two states, we can explore any region of the state-space graph and calibrate estimates to actual distances. In this way, we can bootstrap from no knowledge of the heuristic function, without having to solve even a single problem instance.


4.1. COMBINING HEURISTICS

Artificial Intelligence techniques have never offered powerful methods for combining heuristics. For example, in single-agent domains the best suggestion ventured in the literature is to take the maximum of different admissible heuristics [15]. Another approach, dating back to Samuel's checkers program, con- structs a composite heuristic which is a linear combination of individual features.

As our heuristic estimates represent conditional probabilities, analytically sound methods for combining them are available. For example, if we assume that they are independent, we may simply multiply the conditional probabilities offered by each heuristic. Conditional independence is the only sound assumption in the absence of additional information [1].

With additional information, however, such as a belief that k heuristics are highly correlated, we can instead choose from a family of parameterized (k + 1)- dimensional curves, by determining the probability that each set of parameters is justified by available data. It is heartening to note that the current world-champ- ion Othello program uses a composite heuristic which was autonomously learned by similar techniques [11].

An effective method for combining heuristics, and more importantly, the view that heuristic functions act merely to discriminate among outcomes, can also guide the construction of heuristic functions from combinations of low-level features.

4.2. INTEGRATING CONTROL KNOWLEDGE

Another common use of "heuristic" is to describe rules of thumb for the focus of effort, such as subgoaling strategies. In these, the set of goal states, G, is augmented by a set of subgoal states, G s, toward which search should also be directed. This has the advantage of focusing search prior to achieving the subgoal, and of reducing branching factor afterwards. Previous researchers had under- standable difficulty in developing methods of directing search toward multiple subgoals, and successful systems often place stringent requirements (e.g., partial orders) on the nature of the subgoals.

Within our framework, however, use of a subgoal results from the belief that the subgoal states will lead to better outcomes - a subgoal corresponds directly to a heuristic function on all states. Any subgoal state will serve as an attractor, directing the agent toward it. However, his desire to find subgoals need never replace search for the goal itself - if, while pursuing a subgoal, another state appears preferable, it would be pursued instead.

4.3. OTHER KNOWLEDGE

Often, additional knowledge is available concerning the nature of the heuristic functions. For example, single-agent path-planning algorithms often employ


admissible heuristic functions, those guaranteed to provide an underestimate of solution-path length. The knowledge that a heuristic is admissible is easily represented in our probabilistic heuristic estimates, as it involves only the elimination of certain outcomes (fig. 1).

Many admissible heuristics are also consistent, i.e., their estimates obey the triangle inequality of metrics. This implies that the heuristic estimates of adjacent states can differ by no more than the cost of the operator between them. Such knowledge about the dependence of heuristics can be explicitly represented in the search tree, resulting in a multiply-connected network. In this case, a node-clus- tering scheme can be used to render the network singly-connected [16]. The result, however, is equivalent to that produced by a transformation of the heuristic. If we transform a consistent heuristic function into one which returns the change in the heuristic estimate from a state to its parent, we can then assume independence in the resulting heuristic.

5. Related work

The Bayesian Problem-Solver sketched above is a departure from traditional search techniques. The probabilistic heuristic estimate is an extension of the heuristic function and the probabilistic inference engine is a particular form of backup strategy. This section briefly examines these contrasts.

5.1. HEURISTIC FUNCTIONS

In the literature of single-agent domains, heuristic values are said to estimate the length of solution-paths. In multi-agent domains, their semantics have never been well defined - informally, they are said to measure a state's "quality". Historically, nearly all heuristic search algorithms have used the face-value principle of heuristic interpretation, i.e., behaving as if these estimates were perfect. The necessary and sufficient condition for using the face-value principle is simply that h (Si) = U(S,), the agent's utility function. Unless an agent believes that this condition holds, following the prescriptions of most existing heuristic search algorithms would violate the axioms of rationality upon which utility theory is based.

Some researchers have attempted probabilistic interpretations of heuristic functions, howeverl most differ from the outcome probability interpretation advocated here. Notable exceptions are found in [11] and [19].

5.2. BACKUP PROCEDURES

The problem of combining information in order to draw conclusions is commonly known as inference. However, in the study of state-space problem-


solving, it is usually referred to as "backup". In essence, the many different backup procedures which are used in heuristic search algorithms represent competing inference calculi for reasoning about uncertainty.

Most backup procedures embody the basic principles of the famous Maxmin strategy [21], a logical inference procedure designed for evaluations under condi- tions of perfect information. The adaptation of Maxmin to domains requiring reasoning under uncertainty is known as Minimax [20]. Minimax uses the default calculus of ignoring uncertainty, by making the face-value assumption about heuristic functions, and a perfect-play assumption about opponents. Rather than ignore uncertainty, or invent an alternative representation for it, we make use of probability theory, which appears sufficient.

Another notable contrast to popular backup strategies is that we defy the conventional wisdom of ignoring evaluations of internal nodes (i.e., non-leaves), and instead make use of the valuable information they provide. This resurrects an abandoned idea of Turing's (from his hand-simulated chess program [13]).

6. Empirical results

The primary test domain for BPS has been the Eight Puzzle, a popular testing ground for heuristic search and problem-solving methods. An Eight Puzzle consists of a 3 • 3 frame containing eight numbered, sliding tiles. One of the positions in the frame does not contain a tile, giving rise to the single legal operator in this state-space, sliding any one of the adjacent tiles into the empty position.

In tests of full-width search applied to the Eight Puzzle, the probabilistic inference employed by BPS dramatically outperformed the most powerful existing algorithm, despite searching a shallower tree [7] - the two algorithms are both of the same asymptotic time and space complexity per node. Using a histogram reflecting the outcomes of the states in the Eight Puzzle state-space, BPS achieved the same level of decision quality (probability of making a correct decision) as the Minimin algorithm, despite searching to less than half of the depth which Minimin required. For example, to reach 70% decision quality, BPS examined fewer than two hundred nodes, while Minimin required more than six million nodes.

7. Future work - Control of inference

One area for future research is to explore mechanisms of learning P(O,I h) from much smaller samples, and to plot the curve of performance vs. experience for different learning strategies - anecdotally, even with its experience limited to 1000 sample problems, BPS' performance did not suffer substantially [7]. A


second is to explore coarsening the heuristic, or quantizing the values produced by h, and studying the resulting tradeoff between computational efficiency and discriminatory power.

A third and much broader area for future research involves control of inference. As the probabilistic interpretation of heuristic estimates enables the inference mechanism in BPS to be simple and homogeneous, the system can quickly calculate the effect of the heuristic evaluation of a leaf (merely a multiplicative Bayesian update). Further, the belief vectors in each heuristic node provide an anticipatory belief in the heuristic evaluation before it is performed. These two facts allow BPS to operate with explicit beliefs about the results of an experiment, and explicit knowledge about the impact of that result on other beliefs, and therefore choose among experiments based on the expected change in the utility of the decision [9,10,18,23].

By being able to determine the value of information, one can choose the heuristic evaluation which is most relevant to the decision at hand. The straight- forward application of decision theory [5,19] suggests that any experimental result has a utility associated with it, reflecting the difference in the expected utility of the decision that we would make after the result, and the decision that we would have made before the result. Together with the probability of each possible experimental result, we should be able to calculate the expected utility of the experiment, and choose accordingly.

8. Conclusions

The mapping of sensory features into probabilities of outcomes, by heuristic estimates, is the core of the approach described in this paper. The advantage of this view of heuristic information is that it enables explicit reasoning about beliefs. Without some link between the world of experience and the representation of belief, approaches to the fundamental tasks of problem-solving are necessarily inadequate.

Instead, the approach advocated here is one of action founded on belief and utility. The heuristic estimates provide an explicit representation of probabilities, which permits the use of probabilistic inference as a mechanism for reasoning under uncertainty, and enables the development of utility-directed control mechanisms for search algorithms.

This problem formulation promises the fusion of key principles from Artificial Intelligence, Bayesian statistics, and decision theory, to further the study of rational problem-solving.

References

[1] P. Cheeseman, In defense of probability, in: Proc. Int. Joint Conf. on Artificial Intelligence, Los Angeles (1985).

220 O. Hansson, A. M a y e r / Probabilistic heuristic est imates

[2] P. Cheeseman and M. Self, Bayesian prediction for Artificial Intelligence, in: Proc. 3rd Workshop on Uncertainty in AI, Seattle (1987).

[3] R. Descartes, Discourse on Method: Optics (1637). [4] R.O. Duda and P.E. Hart, Pattern Classification and S~ene Analysis (Wiley, New York, 1973). [5] I.J. Good, A five year plan for automatic chess, Machine Intelligence 2 (1968). [6] I.J. Good, The Estimation of Probabilities (MIT Press, Cambridge, 1965). [7] O. Hansson and A. Mayer, Heuristic search as evidential reasoning, In: Proc. 5th Workshop on

Uncertainty in AL Windsor, Ontario (1989). [8] O. Hansson and A. Mayer, The optimality of satisficing solutions, in: Proc. 4th Workshop on

Uncertainty in AI, Minneapolis (1988). [9] E. Horvitz, Reasoning under varying and uncertain resource constraints, in: Proc. National

Conf. on Artificial Intelligence, Minneapolis (1988). [10] R.A. Howard, Information value theory, IEEE Trans. Systems, Man, and Cybernetics, SSC-2

(1965) 22-26. [11] K.-F. Lee and S. Mahajan, A pattern classification approach to evaluation function learning,

Artificial Intelligence 36 (1988). [12] E.T. Jaynes, Papers on Probability, Statistics and Statistical Physics, ed. R.D. Rosenkrantz

(Reidel, Dordrecht, 1983). [13] A. Newell, J.C. Shaw and H.A. Simon, Chess-playing programs and the problem of complex-

ity, in: Computers and Thought, eds. E.A. Feigenbaum and J. Feldman (McGraw-Hill, New York, 1963).

[14] A. Newell and H.A. Simon, Human Problem Solving (Prentice-Hall, Englewood Cliffs, N J, 1972).

[15] J. Pearl, Heuristics (Addison-Wesley, Reading, MA, 1984). [16] J. Pearl, Probabilistic Reasoning in Intelligent Systems (Morgan Kaufmann, San Mateo, CA,

1988). [17] H. Ralffa and R.L. Keeney, Decisions with Multiple Objectives: Preferences and Value Tradeoffs

(Wiley, New York, 1976). [18] H. Raiffa and R. Schlalfer, Applied Statistical Decision Theory (Harvard University, 1961). [19] S.J. Russell and E. Wefald, An optimal game=tree search using rational meta-reasoning, in:

Proc. Int. Joint Conf. on Artificial Intelligence, Detroit (1989). [20] C.E. Shannon, Programming a computer for playing chess, Philos. Mag. 41 (1950) 256-275. [21] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior (Princeton

University, 1944). [22] D. von Winterfeldt and W. Edwards, Decision Analysis and Behavioral Research (Cambridge

University Press, 1986). [23] A. Wald, Statistical Decision Functions (Wiley, New York, 1950).

probabilistic heuristic estimates

Documents