combining evidences for evidential reasoning

Combining Evidences for Evidential Reasoning J. F. Baldwin Engineering Mathematics Department, University of Bristol, England

An iterative procedure is described as a generalization of Bayes’ method of updating an a priori assignment over the power set of the frame of discernment using uncertain evidence. In the context of probability kinematics the law of commutativity holds and the convergence is well behaved. The probability assignments of each updating evidence is retained. A general assignment method is also discussed for combining evidences without reference to any prior. The methods described here can be used in the field of Artificial Intelligence for common-sense reasoning and more specifically for treating uncertainty in Expert Systems. They are also relevant for nonmonotonic reasoning, abduction, and learning theory.

1. INTRODUCTION

A. Evidential Reasoning

In this article we will discuss the representation of uncertain evidence with respect to a frame of discernment X, how to combine evidences, and how to successively update a prior probability assignment over the power set of the frame of discernment with each uncertain evidence in turn. Each evidence is expressed as an assignment over the power set of the frame of discernment. The relationship of these methods to that of the programming language FRIL is discussed. Methods of decomposition are suggested and compared with the work of Pearl. It is also indicated how the methods can be used when the evidences are expressed in fuzzy set form.

This form of conditioning is an extension of Bayes’ method of updating to the case where uncertain evidences are used. It has many applications in Artifi- cial Intelligence including nonmonotonic logic, expert systems, abductive and inductive reasoning, etc.

We use the term assignment in the same sense that Shafer uses the term basic probability assignment or mass assignment in his theory of evidential reasoning.’ If the assignment is over a partition of the power set then it is a probability distribution over the partition. The prior assignment can be subjective or represent frequencies with respect to a certain sample space of objects. Each evidence can be thought of as expressing a belief that a particular object

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 6,569-616 (1991) 8 1991 John Wiley I& Sons, Inc. CCC 0884-8 173/91/060569-48$04.00

570 BALDWIN

or group of objects from the sample space satisfy a certain property or set of properties. The prior must then be updated in light of this new evidence to produce a new assignment relevant to the more specific type of object under consideration.

A simple example is as follows. A prior assignment, based on the type of crime, is given for various suspects being guilty of a certain crime. It is also known which of these suspects are left-handed, who are tall, and who are middle-aged. From initial enquiries it is believed that the person who committed the crime is left-handed, tall, and middle-aged. These beliefs are not certain and only probabilities can be given for them. This corresponds to uncertain evidence. Each piece of uncertain evidence can be used to update the prior assignment over the set of suspects.

If the evidences were certain we could deduce that the criminal belonged to the intersection of the sets of suspects who were left-handed, tall, and middle- aged. This corresponds to combining the evidences. An assignment over this intersection of sets could then be given by giving each member an assignment proportional to his a priori assignment and normalizing so that the sum of the assignments of each member is 1.

We will give a general assignment method, which generalizes this method of combining the evidences when they are uncertain. Further, an iterative assignment method will be given which generalizes the updating of the a priori assignment used above for the case when the evidence is uncertain.

If the evidences are certain in the example being discussed we can update the a priori assignment with each evidence in turn to obtain a new assignment using the method indicated above. Thus the a priori is first updated using the fact that the criminal is left-handed, to produce an updated assignment. This is further updated using the fact that the criminal is tall, and once again updated with the middle-aged evidence. The final assignment will be the same as if we had combined the evidences first and then updated the prior assignment using this combined evidence. The order of presenting the evidences is not important. The same final assignment will be obtained whatever order of presenting the evidences is used. The updating method when certain evidences are used is equivalent to using Bayes theorem.

The iterative assignment method we will develop below can also be used to sequentially update the prior assignment using each piece of uncertain evidence in turn. One step of this iteration is equivalent to using Jeffrey’s formula.2 When the present assignment is updated using a new uncertain evidence an updated assignment is produced which is consistent with the uncertain evidence being used for the update. Previous evidences used will not necessarily be retained. The iteration is necessary to obtain a final assignment for which all evidences are retained. The order of presenting the evidences is not important for this method. The final assignment will be a member of the family of assignments found using the general assignment method of combining the evidences in the absence of any prior.

A review of existing work in probability kinematics can be found in Garbol- in^.^,^ Relevant work to this study in addition to those references cited are found in Refs. 5 through 16. Field” introduces a modification of Jeffrey’s formula

EVIDENTIAL REASONING 57 1

which satisfies the commutative law of probability kinematics but we have not discussed this in this article. Ichihashi and Tanaka'* discuss modifications to the Shafer-Dempster method of combining evidences and one of their methods is equivalent to the iterative method discussed in this article when convergence occurs in one step. Dubois and Prade" also suggest extensions for the Shafer-Dempster approach. Yage?' discusses extensions using fuzzy sets and Smets2' discusses semantic aspects of the Shafer theory of belief functions.

The iterative assignment method converges to a solution which is equivalent to minimizing the relative information of the new assignment with respect to the prior subject to the constraints that all evidences are satisfied. One stage of the iterative assignment method, updating a prior using uncertain evidence, say E, is equivalent to minimizing the relative information of the updated assignment relative to the prior subject to the constraint E being satisfied. The relative entropy or more strictly the information content of the new assignment relative to the old is a sort of distance measure. Thus the new assignment is the nearest assignment, where near is interpreted in terms of information content, to the prior which satisfies the evidence used to update the prior. Similarly, when updating using several evidences, the final update is the nearest assignment to the prior consistent with all the evidences.

The final assignment can be used to provide answers to various questions of interest. These answers take into account various probabilistic relationships between variables expressed using the a priori assignment and new evidences expressed in probabilistic terms. The method outlined can be used for deduction and abduction. For example, if the tennis court is flooded the method would abduce that it has rained heavily recently. This does not follow logically since many acts imply flooded courts so that all that can be deduced from the fact that the court is flooded is that something which implies a flooded court must have taken place. It is possible that the maintenance man while watering the courts had a heart attack and was unable to turn off the water supply prior to his death and no one had discovered this so the courts became flooded. This is an unlikely scenario compared to the one that it rained so it would have a low probability of occurring compared with the probability that it rained given the evidence of the flooded courts.

The methods described here can be used in the field of Artificial Intelligence for common-sense reasoning and more specifically for treating uncertainty in Expert Systems. They are also relevant for nonmonotonic reasoning, abduction, and learning theory.

B. Probability Logic

The family of assignments given by the general assignment method will correspond to all assignments consistent with the evidences. If these evidences correspond to probabilistic propositions expressing improper axioms of a theory then any probabilistic proposition that is consistent with all members of this family of assignments is a theorem. Any a priori assignment which satisfies the improper axioms is a model of the theory. The iterative assignment method updates to the a priori assignment for all models.

572 BALDWIN

C. Fuzzy Uncertainty

We do not wish to suggest that all evidential reasoning under uncertainty can be treated using the probabilistic methods given in this article. Fuzzy uncertainty plays an equally important role in evidential reasoning and corresponds to imprecise definitions of concepts rather than uncertain evidences expressed as assignments over a set of labels which are well defined. The theory of fuzzy sets2* plays an important role in this respect.

The frame of discernment, X, is a set of labels. These labels can be semanti- cally understood with respect to the real world. For example, the names in the example above identified actual persons. More generally the frame of discernment will be defined as the combinations of instantiations of variables. Suppose, for example, that all evidences and conclusions of interest can be identified with respect to the variables X, Y, and Z. Further suppose that the variables can be instantiated to values in the sets {xl, . . ., xn} , {y 1, . . ., y m } , and {z I , . . ., z r } , respectively. The frame of discernment is the set of all points in the product spaceX + Y x Z, that is, { (xi , yj, zk)}. The a priori assignment and all evidences are expressed over the power set of this frame of discernment. If many variables are required and/or many instantiations of the variables are required then the above methods of evidential reasoning will become computationally excessive. We will show later how these problems can be decomposed into subproblems of lower dimension by exploiting various dependency relationships, but even then some of the subproblems can have frames of discernment with an excessive cardinality.

One way of reducing the computational effort is not to allow the variables of interest to have too many possible instantiations. For example, we could allow the height of a person to be expressed as belonging to one of a certain set of intervals. Alternatively, we could allow the height variable to take fuzzy sets, expressed on the height space, as its possible instantiations. For example the labels “very tall,” “tall,” “above average,” “average,” “below average,” “short,” and “very short” could be used instead of the intervals. The advantage of using fuzzy sets has been well illustrated in the field of fuzzy control. The overlapping nature of fuzzy sets allows a few rules to be used for control purposes. These rules interact through the overlapping of the fuzzy sets so that for any set of fuzzy-valued inputs several of the rules will contribute to estimat- ing the required control. This provides a robustness of behavior which gives a stability to the controller even when input conditions stray from those cases considered when the rules were constructed.

Both the general assignment and iterative assignment methods can be adapted to allow for the frame of discernment to have members which are fuzzy labels. In addition the uncertain evidences can also be expressed as probability assignments over a set of fuzzy labels. These do not have to be the same labels as used in the frame of discernment but will of course be over the same product space as the labels of the frame of discernment. The general assignment method can be generalized to allow for intersections of fuzzy sets and a similar generalization can be made for the iterative assignment method. It is required that the

EVIDENTIAL REASONING 573

intersection of two fuzzy sets, corresponding to labels in different evidences or to labels in one evidence and a prior assignment, is a fuzzy set corresponding to alabel in the frame of discernment. If this is not the case, then a transformation is required to transform an assignment over a set of fuzzy labels to an assignment over the set of labels of the frame of discernment. This transformation is done in two parts:

(1) A weighted assignment is given to the set of labels where each label in the set corresponds to an intersection of one of the evidence labels and one of the labels of the prior assignment. The weights correspond to the maximum membership level of each of the sets resulting from intersection of two labels.

(2) The assignments of those labels not in the frame of discernment are transferred to subsets of the frame of discernment.

We will discuss the idea of using fuzzy labels through some simple examples but the complete generalization will be left for a future article.

D. Incompatible Evidences

If evidences are given, for which the general assignment method gives an empty set, then the evidences are said to be incompatible. This means that there is no assignment over the frame of discernment which satisfies all of the evidences. In this case the iterative assignment method will converge in the sense that a solution will be obtained which if treated as a prior and then updated successively using each piece of evidence in turn will return the same solution. The evidences will not all be retained and convergence in the sense that all individual updates are the same as the prior assignment is not obtained.

E. Relation to Dempster-Shafer Theory of Evidence

The general assignment method, for a given proposition P gives intervals defined by a support pair [Sn(P), Sp(P)I which, in general, is different from the interval [SI(P) , Su(P)] given by the Dempster-Shafer theory. Suppose two evidences contain labels {ai} and {bi}, respectively, which correspond to subsets of the power set of discernment. Suppose also that the two labels “ai” and “bj” correspond to sets which have no intersection, then the general assignment method assigns 0 to the combination of these two labels corresponding to allocating 0 to the null set. The Dempster-Shafer theory initially allocates an assignment equal to the product of the assignments of the two labels to this combination and then redistributes it to other label combinations corresponding to nonnull intersections in proportion to the initial assignments. The general assignment can be understood in probability terms. The Shafer approach would appear to be more difficult to interpret. The method is applicable even when the evidences are incompatible which is not true for the general assignment approach. In the special case when the two evidences to be combined do not

574 BALDWIN

give any label combinations corresponding to null sets the two approaches give the same solution. In order to justify the Dempster-Shafer approach it is necessary to provide an interpretation for which the Dempster rule is applicable. Shafer discusses this in relation to messages but we will not discuss this further in this article. If an assignment is obtained by maximizing the entropy subject to the evidence constraints, then the assignment or set of assignments obtained is a member of the family of assignments given by the general assignment met hod.

F. Content

In the following we describe the general assignment method of combining evidences and the iterative assignment method of successively conditioning a prior assignment over the frame of discernment with various evidences. Exam- ples are given to illustrate the methods. It is also illustrated how causal methods such as used by Pearlz3 and other decompositional ideas can be used to decom- pose a problem into subproblems.

The relation of the iterative assignment method to the support logic programming language FRIL and that of the general assignment method to probability logic is given. The language FRIL through support logic programming is a special case of the iterative assignment method. For a rule which expresses the probability of the head given the body and the probability of the head given the negation of the body, the iterative assignment method gives the FRIL solution if the distribution corresponding to maximum entropy subject to the conditional probability constraints of the FRIL rule is chosen as the prior assignment and the probabilities of the conjuncts of the body are given as evidences.

Proofs of convergence for the iterative assignment method and proof that the converged solution corresponds to a member of the family of general assignment solutions and that this corresponds to minimizing the relative information with respect to the prior assignment are given for a particular example. More general proofs will be given in a future article.

The general iterative assignment method provides an approach to probabil- i ty kinematics, a term coined by JeffreyZ which preserves the law of commutivity and is consistent with first combining the evidences and then updating the prior. It is an extension of Bayes’ method of conditioning using uncertain information and is equivalent to Bayes when the evidences are certain. The simple use of Jeffrey’s formula for probability kinematics does not lead to sensible results except in very special cases. The law of commutivity is not in general preserved under Jeffrey conditioning and the Jeffrey solution often does not correspond to a solution in the set of possible assignments consistent with the evidences. The iterative assignment method is consistent with the method of Pearl when this is applicable but is more general. It deals with uncertain evidences given as an assignment over the power set of the frame of discernment and does not require that a distribution is given over a partition. Further the method can use fuzzy sets as labels for the elements of the frame of discernment.


11. GENERAL ASSIGNMENT METHOD

A. Support Pairs

A support pair, denoted by S = [Sn, Sp], defines a convex set of probabilities S such that if sl E S and s2 E S then as1 + ( I - a)s2 is also a member of S for all a E [ O , I ] . If SI = [al , PI] and S 2 = [a2, p2] then the intersection S = SI n S2 = [a , p] where a = MAX{aI, a2) and p = MINQI, 0 2 ) . The union of SI and S2, SI U S2, is given by [a, p], where a = MIN{aI, a 2 ) and 0 = MAXMI, 02) and the complement of S, denoted by 3, is given by [ I - f i , 1 - a].

B. General Assignment Method for Combining Evidences

Let H be the set {HI, H 2 , . . . , Hn}, where Hi are disjoint sets and H = 2" the power set of H . Let P be a probability assignment over 2" defined by the basic assignment function p: 2" + [ O , I ] such that p ( { }) = 0, and &:XCH

p ( X ) = I . The sets X for which p (X) # 0 are called focal elements. We can associate with any set Y, Y C H, a support pair given by

where { } is the null set. Thus Sn( Y) can be interpreted as the necessary support for Y and Sp(Y) as the possible support for Y.

We can also give a logic interpretation. Let (H, h ) be a propositional space consisting of a frame of discernment H and a Boolean field of propositions h which corresponds to the power set H. Thus the logic statement X v Y corresponds to the set theoretic statement X U Y and similarly X v Y corresponds to X n Y, i X to x, X 3 Y to U Y, X = Y to X = Y and any incompatible statement X v i X = @ to the null set { }. P is the probability assignment over h and is equivalent to P defined above where p ( X ) for some proposition X is equal to the probability of the corresponding set X C H. We can associate with any proposition YE h a support pair

If evidence El provides a basic probability assignment P1 for the set H, and E2 similarly provides a probability assignment P2 over H then we can combine the evidences by combining the probability assignments P1 and P2 to produce a family of probability assignments P1 @ P2 over H which, for each assignment, is a mapping

p l @ p2: H + [O , I ]

satisfying the constraints

576 BALDWIN

p l @ p2(X n Y ) = 0 for any focal element X of H and any focal element Y of H such that X n Y = { }.

This assignment is not in general unique. We will call this the general assignment method of combining evidences. The set of solutions given by the general assignment method is called the assignment family of solutions.

C. An Example

Consider H = {a, b, c, d , e , f) and the two probability assignments:

then we can derive the following family of assignments

subject to

In the diagram, the probabilities in any column of the table must add up to the corresponding column probability associated with E2 and any row probabili-


ties must add up to the corresponding row probability associated with El. The entries in the table define the focal elements of H w.r.t. the combined assignment PI 03 P2.

We can associate a support pair with any focal element of H w.r.t. P1 @ P2 as described above. For example:

and

The last three are given in terms of the assignment family parameters and are not strictly support pairs since Sn is a necessary support for and ( 1 - Sp) is a necessary support against and should therefore take minimum values over the family of probability assignments. We should find the smallest x to give the necessary support and this is 0, so that

Nec Support for {b, c} = 0 and Pos Support for {b, c} = 1, i.e. S({b, 4) = [ O , t1

Similarly for x = 0 we choose w + z = A giving

and z = i$ so that w = h giving

This shows that necessary support for {b, c} is given by 1 - f = f.

D. The Conjunction Example

The general assignment method of combining evidences given on the same frame of discernment can be generalized to the case where each individual probability mass assignment is given on different frames of discernment. For example, finding the probability of the conjunction of two propositions P, Q when only Pr(P) = x and Pr(Q) = y are known can be viewed as the following assignment problem.

We give this as an example since we will later discuss the iterative assign-

578 BALDWIN

ment method, its convergence and relationship to minimizing the information content with respect to the update assignment and an a priori assignment, both defined on the frame of discernment {PQ, P l Q , i P Q , i P i Q } .

Allocate the table entries in the following tables such that the row and column constraints are satisfied. A constraint is satisfied if the table entries in a given column, (row) add up to the column, (row) entry.

P 1 P X 1 - x

P 1 P X 1 - x

Q

1Q 1 - Y

Y

so that

i Q A P i Q A T P 1 - y - t

F o r x 5 y For y 1 x

where where o s 5 1 x ; o 1 5 5 1 - y O s ( s y ; 0 1 ( 1 1 - x

MAX{O, x + y - I } 5 Pr(P A Q ) 1 MIN{x, y }

This solution contains the maximal entropy solution since if we wish to choose {Pi} in the tables below such that S = -ZPi lnPi is maximized subject to

P1 + P3 = x P2 + P4 = 1 - x P1 + P2 = y P3 + P4 = 1 - y PI + P2 + P3 + P4 = 1

P 1 P X 1 - x

1Q 1 - Y

then

P1 = xy P2 = y(1 - x ) P3 = x ( l - y ) P4 = ( 1 - x ) ( l - y ) .


A 0 0.2 - 5 0

0 i A 0 0.4 - 5

A i A 5 0.1 + 4

The probability of the conjunction P A Q is the product of the probabilities of the conjuncts under maximum entropy considerations as it is also under the assumption of independence of P and Q.

A 0.1 + 6

i A 5 U

0.2 - 5 - 5

E. A Logic Example

Consider a proposition A whose truth is derived in two different ways. Each method, because of the uncertainty of other propositions, provides a support for and a support against A.

Method I : support for A = 0.3; Method 2: support for A = 0.2;

support against A =0.4 support against A = 0.5

We can combine these two evidences for A using the following tableau

We will use the general assignment method first where U stands for (A v i A ) .

A i A U 0.2 0.5 0.3

A 0.3

i A 0.4

U 0.3

~~

The probabilities are therefore given by

Pr(A) E [0.3 + 5, 0.5 - 51 P r ( iA) E [0.5 + 5, 0.7 - 51

where 0 5 4 5 0.2; 0 5 5 50.1; 0 5 5 + 5 5 0 . 2 so that support pairs are

S(A) = [0.3, 0.51 S ( i A ) = [0.5, 0.71

The computer language FRIL (Baldwin et uses an intersection rule to combine support pairs associated with different proof paths for solving a given query. For the above example this corresponds to combining the support pairs by intersecting the intervals associated with the support pairs. For example,

[o.3,0.61 n [0.2,0.51 = [o.3,0.51

580 BALDWIN

gives the support pair [0.3, 0.51 for A. This is equivalent to the result given by the general assignment method.

In general a support pair [x, y ] for a proposition A corresponds to a probability mass of x allocated exactly to A , a probability mass of (1 - y ) allocated exactly to i A and a probability mass of ( y - x) allocated exactly to (A v i A ) . If the support pairs associated with A using two different proof paths are [xl, y 11 and [x2, y21, respectively, then the support pair for the combination of the two proof paths is given by the intersection rule and is

If this intersection is empty then there is conflict between the solutions corresponding to the different proof paths and there is no solution. This corresponds to the case when there is no possible allocation in the general assignment method which will satisfy all the row and column constraints. If the intersection is not empty then it will give the same solution as the general assignment method.

F. Probability Logic

Consider the following probability logic example. Each object in a population of objects, has property A true or false and also property B true or false. You are told that for 3 of the objects A 3 B is true and for t of the objects A is true. Therefore for an object selected at random:

Pr(A 3 B ) = 3 Pr(A) = 4

We can consider these to be improper axioms of a theory. The general

It is required to determine the support pair for B . The general assignment assignment method will then give all models of this theory.

method gives

which determines the support pair


Thus Pr(B) E [A, f] is a theorem of the theory since it is true for all models of the theory.

111. ITERATIVE UPDATING ASSIGNMENT METHOD

A. The Basic Iterative Assignment Method

Let H be a frame of discernment and P, an a priori probability assignment over the power set H = 2" and P , a more specific probability assignment over H. How do we combine these probability assignments P, and P , to form P, CB

Consider the following method of assigning the probability assignment P,?

P , 63 P , . Let K, , for all focal elements X of P , , be given by

The a priori probability assignment P , is replaced by

P ( X ) = c K,P,(Z)P,( Y ) ; for all focal elements of P , ,

I Y is a focal element of Pu

z is a focal element of P, .zn '=' z. Y:{

P ( X ) = 0 if for all Z , Y such that Z f l = X , Zis not a focal element of PI or Y is not a focal element of P , . Also P ( { }) = 0 .

We denote this by P i = T(P , , P , ) . This new probability assignment P i is used with P , to determine a further updated a priori assignment P ; .

If P ; = P;' then PI CB P , = Pi otherwise the process of updating is continued until this iteration converges. If the method converges to the assignment P ' then

P ' = T ( P ' , P,)

B. Relative Entropy

Let the assignments PI and P , be represented by assignments over the sets of labels {I+} and { lui} , respectively. Each label li is the name for a subset of H which is a focal element of the appropriate assignment. Let each element of H be a focal element of P I , Let the labels for P, be such that the label corresponding to the sets associated with the labels 1,i and 1 , j is a label of P, for any i of PI and anyj of P,. Let the probability assignment for label 1,i be p i , for all i. Let the probability of the updated assignment P, CB P , of the label 1,i be p ' i , for all i . Then the above algorithm is equivalent to the following optimization.

p ' i p ' i Z p ' i . l n 7 PU PI

MIN

582 BALDWIN

where the constraint P , consists of all the assignments associate with P , and Cpi}, Cp’i} are sets of probabilities which sum to 1 in each case.

This will be illustrated for the conjunction example below and given in detail with all its consequences in a future article.

C. An Example

A box contains balls some of which are red and some are blue, some are large and some are not large. The box contains 12 balls and the distribution of reds and large balls are given in the following table.

L 1 L

Each ball is equally likely to be drawn from the bag so that the a priori information PI is given by

Pr(R A L) = if Pr(R A i L ) = f P r ( i R A L) = f P r ( i R A 1 L ) = f

Half the balls are randomly removed from the bag and you are told that for the new bag of six balls 8 of them are large. Thus P , is given by Pr(L) = 5

i.e. Pr({(R A L), ( 1 R A L)}) = 1 Pr({(R A i L ) , ( i R A l L ) } ) = Q

The iterative assignment method gives the following updating tableau

p , {L A R, L A i R } { i L A R, TL A TR}

8 Q p ;

Q R A L I R A L O f

f l R A - l L I { }

K1 = y K2 = )2

A normalizing constant K , is associated with each column of the tableau. For each column the normalizing constant is equal to the inverse of the sum of assignments in the prior of those labels which intersect with the label of P,,


associated with this column. Zero assignments are given to those cells in the column whose P, label has no intersection with the column label of P , . For the other cells in the column the assignment is given as the product of the normalizing constant, the assignment in the same row as the cell of PI and the assignment of the column assignment of P,. When we talk of intersection of labels we mean the intersection of the sets of H associated with the labels. Each cell also is given a label corresponding to the label of the intersection of the sets of the row label of PI and the column label of P, . The new assignment for a label is the sum of assignments of cells with that label.

Repeating with P ; as a priori probability assignment gives

4 R A L

Pi i2iR A - I L

4 i R A L

h R A i L

KI = $ K2 = 6

This gives the same solution so that PI @ Pu = P i , that is,

We can show that this is the correct answer using elementary probability arguments. Since for the new sample space Pr(L) = i we know that the new bag will contain two balls satisfying R A L and three balls satisfying i R A L. Furthermore one ball will be chosen from the seven balls containing four balls satisfying R A i L and three balls satisfying i R A i L . Thus the following sample spaces are possible

L 7 L

with probability 4

584

L 1 L

BALDWIN

with probability 3

so that Pr(R A L) = (f)(+) + (f)(3) = 4 and similarly for the other probabilities derived above.

D. Jeffrey’s Formula

Suppose Po be a prior probability distribution assigning probabilities p y ,

suppose that new evidence establishes some proposition E in the domain of the prior distribution Po with probability q. Jeffrey2 proposed that the posteriori distribution be given by the conditioning rule

p 2 , 0 . . , , p: to n mutually exclusive and jointly exhaustive events. Further

P(A) = qPo(AIE) + (1 - q)Po(A(lE)

Jeffrey argues that if an experience causes an agent to change his belief in E from Po(E) to q then he should change his belief in A according to the above formula.

We do not wish to discuss the philosophical issues of the applicability of Jeffrey’s formula. The equivalence of this formula to that obtained by minimum relative information considerations is interesting, but not Jeffrey’s justification of the formula. In this article we accept the minimum relative information criterian for updating a priori with a set of evidences.

For the example of the previous section, the a priori information gives

The new information gives

Pr(L) = I

so that using Jeffrey’s formula we obtain

which is the same result as the updating assignment method since

P , @ P d R ) = P , $ P d R A L ) + P , $ P d R A l L ) = f + & = 3


R A L 4%

In fact, the updating assignment method is equivalent to using Jeffrey’s formula for combining the evidences in this case. The Jeffrey formula is applicable if Pr(R1L) = Pr’(R1L) and Pr(R1-d) = Pr‘(RIiL) where the prime indicates that the probability is with respect to the sample space consistent with the new information. In this case this corresponds to the expected value over the different possible sample spaces. Thus

0.29% { I R A K 0 *

so that the conditions are satisfied.

are satisfied since from the tableau In the updating assignment method it is easily seen that these conditions

{ R A L , i R A L} 44

E. Example Revisited Using Support Pairs for Evidence Consider the following modified problem to that given above.

PI: Pr(R A L ) = i$ Pr(R A i L ) = & P r ( i R A L ) = & P r ( i R A i L ) = i$ Pr({R A L, R A i L } ) =

P,: Pr({R A L , i R A L}) = 3 Pr(H) = Q

where H = {X A Y : X E { R , i R } and Y E {L, i L } }

Pr({R A L, 1 R A L}) =

Pr({R A i L , i R A i L } ) = Q

{ I { R A L, 1 R A L} 0 $2

The updating assignment tableau is

Q L

B 1 L

B Update I t , l L }

fz R A L

A R A l L

0.3274

K1 = Y K2 = Y K3 = 1

586 BALDWIN

0.0754 1 R A 1 L

0.0278 {R A L , R A l L }

0.3274 {R A L, 1 R A L}

and the update to this

I } 1 R A 1 L 0 0.0476

R A L R A i L 0.0242 0.0175

{R A L, 1 R A L} I } 0.2857 0

Q L

R A L 0.2857

I } 0

1 R A L 0.0952

I } 0

R A L 0.0

( R A L , i R A L} 0.2857

Q 1 L

R A K 0.3428 t }

0 0.0571

R A i L R A l L 0.1190 0.0238 0.1429

1 R A L 0.1143 I }

1 R A 1 L

0 0.0190

0.0572 1 R A 1 L

0.0416 0.0095

R A i L IR A L , R A i L } o.o 0.0 0.0

{ I {R A L, 1 R A L} 0.3428 0 0.0571

Q Update {L , l L }

0.2996 R A L

R A K 0.0499

0.3356 0.2615

0.1015 R A i L 0.0268 0.1458

0.1607 R A l L

0.1091 1 1 R A L 1 R A L 0.0952

1 R A L 0.0182 I 0.1134

0.0602 1 R A 1 L

0.0126

{R A L , 1 R A L } 0.0546 0.3403

1

K3 = 1.0

A further four iterations produce the following tableau

0.3428 R A L

0. I429 R A i L

0. I 143 T R A L

0.0572 1 R A 1 L

0.0 {R A L, R A 1 L )

0.3428 {R A L, 1 R A L}

K1 = 1.2500 - I K2 = 4.9988 K3 = 1.0

so that Sn(R) = 0.3428 + 0.1429 so that P, d3 P d R ) E [0.4857, 0.82861

Sp(R) = Sn(R) + 0.2857 + 0.0571


IV. PROBABILITY KINEMATICS

A. Successive Updating Using the Iterative Assignment Method

This term was coined by Jeffrey and involves, within our context, the stage-by-stage updating of the recent a priori probability assignment with new evidence expressed as a probability assignment over the same frame of discernment. Discussion of the use of Jeffrey’s rule and further extensions are given in Domotor, Zanotti, and Graves’ and Field” also discusses this and introduces a different interpretation of the Jeffrey rule. Suppose the initial a priori assignment PI is updated using P,, , corresponding to evidence El, to the assignment PI,,, and this is treated as the new a priori assignment to update using P , corresponding to evidence E2 to the new assignment P I , E I . E Z . The commutative law requires that

Using the Jeffrey’s rule does not in general give results which are consistent with this commutative law.

The updating assignment method with iteration can be used for this condi- tionalizing and updating. First PI.EI is determined using the updating assignment method and iterating until Pi.,, = Pi,,,,,, where P i is the assignment after some number of iterations. This is then treated as the new a priori probability and updated to P i , E , , m . Further iterations are used until P;,,,.m = p;’E,,E,,m where Py.,,,m is the assignment after some number of iterations. The a priori PI is then replaced by Py,E, , , and the whole process repeated and further iterations until

and

P;,,,,,(X) = P E I ( X ) for all focal elements X of P,, .

When these conditions are satisfied the commutative law holds. It is important that the probability assignment of evidence El can be recovered from the final assignment P y E I m . The formulation of the frame of discernment must be chosen so that this is possible.

We will illustrate this with several examples. Conditions of convergence and entropy considerations will not be further discussed in this article. Com- ments relevant to this can be found in several of the papers quoted in this area although none, of course, relate to the actual iterations since iteration is not used in present methods.

B. Conjunction Example

We will first discuss an artificially posed problem of determining the probability of the conjunction of two statements, A A B, when given the probability

588 BALDWIN

of each of the conjuncts. We will determine this by first selecting an a priori over the set

and then presenting El as Pr(A) and then E2 as Pr(B). We know that the correct solution lies within the interval [MAX{Pr(A) + Pr(B) - 1 , 0}, MIN{Pr(A), Pr(B)}. We would expect the updating assignment method to give a point in this interval and its actual value to depend on the a priori assignment chosen. We will find that this is the case and if we choose an equally likely probability assignment we will obtain the maximum entropy solution found in part 1 of this article of Pr(A).Pr(B). For our numerical example we choose Pr(A) = 0.9 and Pr(B) = 0.75. Thus

Pr(A A B) E [0.65, 0.751 and similarly Pr( iA A B) E [0, 0.11 and Pr(A A i B ) E [0.15, 0.251 and Pr( iA A i B ) E [0.1, 0.21

Using the updating assignment method we obtain the following results.

i B } the method iterates in one step to give Starting with a priori {a, t , f , a} for {A A B, A A i B , i A A B, i A A

Pr(A A B) = 0.675 Pr(A A i B ) = 0.225

Pr(iA A B ) = 0.075 Pr(iA A i B ) = 0.025

Starting with a priori {O.l, 0.4, 0.4, 0.1) for {A A B, A A i B , 1 A A B, i A A i B } the method iterates to the solution

Pr(A A B) = 0.6524 Pr(A A i B ) = 0.2476

Pr(iA A B) = 0.0976 Pr(iA A i B ) =0.0024

In this example one-step iterations are required for each separate update El and E2 but the process as a whole, that is, update with E l m , requires several iterations. After the first iteration the solution for Pr(A A B) is 0.5192 which is not even in the range of possible solutions, namely [0.65, 0.751. The actual iterations are shown in the following table:

Atoms A priori Update Update Update Update ab 0.1 0.5192 0.6474 0.6522 0.6523 a l b 0.4 0.2432 0.2475 0.2477 0.2477 l a b 0.4 0.2308 0.1026 0.0978 0.0977 i a i b 0.1 0.0068 0.0025 0.0023 0.0023

a : 0.5 0.7625 0.8949 0.8998 0.9 b: 0.75 0.75 0.75 0.75 0.75


Starting with a priori (0.4, 0.1, 0.1, 0.4) for {A A B, A A i B , i A A B, i A A i B } the method iterates to the solution

Pr(A A B) = 0.7290 Pr(A A i B ) = 0.1710

P r ( i A A B) = 0.0210 Pr ( iA A i B ) = 0.0790

In this example one-step iterations are required for each update but the process as a whole requires several iterations.

To produce high values for A A B choose the a priori probabilities for A A B and i A A i B high in comparison with the other two. Low values are obtained if these are chosen low in comparison with the other two. This is obvious from the general assignment solution.

We can modify this example by giving the a priori assignment over the set

{ A A B, A A i B , i A A B, 1 A A i B , {A A B, A A i B } , {A A B, - , A A B}}

and the updating assignments for A and B over the sets

respectively. This corresponds to incomplete information with regard to the probability assignments. In this case the updating assignment method requires iterations for both the individual updatings, that is, P = T(P, PE, ) and P' = T(P , Pn) and also the updating as a whole to give P

In the following examples the assignment over {A, i A , {A, iA}} is (0.8, 0.1, 0.1) and the assignment over {B, i B , {B, iB}} is (0.75, 0.25, 0). We give solutions for different a priori assignments.

With the a priori assignment over

{A A B, A A l B , i A A B, i A A i B , {A A B, A A i B } , {A A B, i A A B}}

of (8, 4, 8, f , 0, 0) we obtain after several iterations

Pr(A A B) = 0.6718 P r ( i A A i B ) = 0.0329

Pr(A A i B ) = 0.2171 P r ( i A A B) = 0.0782

An a priori assignment of (0.4, 0. I , 0.4, 0.1, 0, 0) gives

Pr(A A B) = 0.7261 P r h A A i B ) = 0.0864

Pr(A A i B ) = 0.1636 P r ( i A A B) = 0.0239

An a priori assignment of (4, Q, Q, 9, Q, 6) gives

Pr(A A B) = 0.6914 P r ( i A A i B ) = 0.0247

Pr(A A i B ) = 0.1975 P r ( i A A B) = 0.0864

590 BALDWIN

0.8 A

0.1 i A

0.1 {A, i A }

The general assignment method provides the solution possibilities for the conjunction given the probability assignments for A and B used in these examples as follows:

AB A i B A 0.05 + x + y 0

TAB i A i B i A o + x 0.1 - x 0

0.75 - x - y

{AB, TAB, A i B , i A i B }

0

{AB, TAB} { A i B , i A i B } O + Y 0.1 - y

0.75 0.25 0 B i B {B, i B }

for 0 5 x I 0.1 so that

0 5 y 5 0.1

Pr(A A B) E [0.55, 0.751 and similarly Pr(A A i B ) E [0.05,0.25] and P r ( i a A B) E [ O , 0.21 and P r ( iA A i B ) E [0, 0.21

With the assignment over {A, i A , {A, iA}} of (0.8, 0.1, 0.1) and the assignment over {B, i B , {B, iB}} of (0.7,0.2,0. I ) and the a priori assignment of (Q, Q, Q, Q, Q, Q) we obtain

Pr(A A B) = 0.6914 P r ( i A A i B ) = 0.0247

Pr(A A i B ) = 0.1975 P r ( i A A B) = 0.0864

The probability of 0.1 associated with {B, i B } is effectively given to B and i B in proportions governed by the a priori distribution which in this case is equally likely. Hence the solution is as above.

C. Convergence and Relative Entropy Considerations

In this article we present only a summary and discussion of the basic ideas concerning convergence and what the converged solution actually corresponds to.

Consider an aprioridistribution, Po = (Po}, over the power set of the frame of discernment and that this is to be updated using the iterative assignment method using evidences El, E2, . . . , En where each evidence Ei is a distribution over some partition over the power set of the frame of discernment. Suppose


further that the evidences El, . . . , En can be combined using the general assignment method to give a nonempty set of possible solutions. The a priori Po is successively updated first using El, then E2, and so on until finally it is updated using En giving { p l ( E l ) } , {p l (EIE2) , . . . , ( p ’ ( E l E 2 . . . En)}, respectively. The process is repeated with (p ’ (EIE2 . . . En)} as the a priori to obtain { p ? ( E I ) } . { p 2 ( E l E 2 ) , . . .. { p 2 ( E l E 2 . . . En)}. This process is repeated producing the sequence (p ’ (EIE2 . . . En)}, {p2(E1E2 . . . En)}, . . . , (p’(EIE2 . . . En)}. This sequence converges to one of the set of possible solutions given by the general assignment method. Furthermore, the convergence does not depend on the order of the evidences El, E2, . . . , En used in the updating process.

A single update of the probability distribution {p} using evidence Er giving the probability distribution {p‘} of the iterative assignment method is equivalent to choosing {p’} so that the relative information

Z(P, P’) = Zp;.In(p;/pj) is minimized subject to the constraint Er being satisfied.

In general a single update using evidence Er retains the evidence Er but destroys evidence relations retained on previous updates. The iteration process is required in order to obtain a final solution which retains all the evidences E l , E 2 , . . . , En.

The converged solution {p ‘ } of the iterative assignment method used to update the a priori {p} with respect to evidences El, E2, . . . , En minimizes the relative information

Z(P, P’) = ZpJ’.In(pj’/pj)

subject to the constraints El, E2, . . . , En being satisfied. For example, in the case of the conjunction problem discussed above where

an a priori distribution { X I , x2, x3 , x4} over the frame of discernment {ab, a i b , l a b , i a i b } was updated using evidences El: Pr(a) = a and E2: Pr(b) = p to give the distribution { y l , y 2 , y 3 , y4 } , the {yi} satisfies

MIN Zyi,ln(yi/xi) with respect to {yi}

subject to

y l + y 2 = a y l + y 3 = p y l + y 2 + y 3 + y 4 = I

Thus

In(yl/xl) + A1 + A2 + A3 = 0

592 BALDWIN

ln(y2/x2) + A 1 + A3 = 0

ln(y3/x3) + A2 + A3 = 0

ln(y4/x4) + A3 = 0

and therefore

Also

Therefore

y l = a + P + k . x 4 - 1 y 2 = 1 - y4 = k.x4

Therefore since 0 5 yi 5 1

MAX(1 - - 6, 0) 5 k.x4 5 MIN(1

so that

MAX{a + j3 - 1,0} 5 y l SMIN{(Y, p}

MAX@ - a, 0) 5 y l 5 MIN(1 - a, p} MAX(1 - (Y - p, 0) 5 y l 5 MIN(1 - a, 1 - p}

MAX{a - p, 0) 5 y l 5 MIN([ga, 1 - p}

which is the solution given by the general assignment method.

p, also satisfies Eqs. (1) to (4) given above. The solution given by the iterative assignment method, for any given a and

D. A Crime Problem

Four persons, a, b, c, and d are the possible suspects of a crime. The persons {a, b} are left-handed and the persons (a, c} are female. The a priori assignment over {a, b, c, d } for the person being the actual criminal is given,


The estimated probability that the criminal is left-handed is 0.8, say evidence El, and that the criminal is female is 0.7, say evidence E2. We use the updating assignment method to update the a priori assignment to a final assignment using El and E2 by first updating using El and then updating this again using E2. With the convergence of the iteration procedures used the order of presenting the evidences does not matter. PiEln = Pi,,, where

For example

apr ior i : Pr(a) = f Pr(b) = t Pr(c) = 4 Pr(d) = gives P’r(a) = 0.56 P’r(b) = 0.24 P’r(c) = 0.14 P’r(d) = 0.06

in one-step iteration. This is the solution that the Shafer-Dempster method gives resulting as a consequence of choosing an equally likely prior assignment. Using Jeffrey’s formula will also give this result since it is equivalent to the updating assignment method when this method iterates in one step.

a priori: Pr(a) = 0.1 Pr(b) = 0.1 Pr(c) = 0.1 Pr(d) = 0.7 gives P’r(a) = 0.6306 P’r(b) = 0.1694 P‘r(c) = 0.0694 P’r(d) = 0.1306

Several iterations are required to obtain this result. The iterations are summa- rized in the following table:

Atoms A prior; Update Update Update Update Update Update

0. I 0.1 0.1 0.7

0. I 0. I 0.1 0.7

0.6588 0.2087 0.0412 0.0913

0.8675

0.7226 0.0774 0.1143 0.0857

0.8369

0.6350 0. I748 0.0650 0.1252

0.8098

0.6076 0. I727 0.0622 0.1378

0.6697

0.6312 0.1702 0.0688 0.1298

0.8014

0.6273 0.1699 0.0683 0.1317

0.6957

0.6307 0. I695 0.0693 0.1305

0.8002

0.6301 0.1695 0.0693 0.1307

0.6994

0.6305 0. I694 0.0694 0.1306

0.8000

0.6305 0.1694 0.0694 0. I306

0.6999

0.6306 0. I694 0.0694 0.1306

0.8000

0.6306 0.1694 0.0694 0.1306

0.7000

The solutions for any a priori must lie in the intervals

594 BALDWIN

0.7 0.3 {a, c } { b , 4

0.2 {c, d } 0.14 - x 0.06 + x

I 1 I

where 0.06 5 x I 0.14

E. A Chess Problem

A group of players are to enter a tournament. Each player is male, (m), or female ( w ) , local, (h ) , or out-of-towner, (g), and junior, (j), or senior, (s). An a priori judgment on each of their chances of winning can be given. The tournament is played and you are told that the probability that the winner was a local player is Pr(h) and a junior PrG). What is the probability that the winner is male? The following database is given:

Player Sex Locality Age A priori John m h j 0.1 Mary W h j 0.1 Anne W h j 0.1 Bill rn g j 0.1 Harry m g j 0.1 Trevor rn h S 0.1 Bruce rn h S 0. I Pat W R S 0.1 Gloria W g S 0.1 Helen W R s 0.1

From this we can determine the a priori assignment over the frame of discernment {mjh, msh, mjg, wjh, wsh, wjg, wsg} as (0.1, 0.2, 0.2,0, 0.2, 0, 0, 0.3).

We wish to update this assignment with evidence El which says that Pr(h) = 0.8 and then with evidence E2 which says that Pru) = 0.7.

A suitable tableau for this problem is given below. The four quarters of the usual tableau are independent apart from the transfer of update values in the upper part to the a priori values in the lower part.


Pr(h) P r k ) h g

a priori update a priori update mjh xhl yhl mjg xgl Y g l msh xh2 yh2 msg xg2 yg2 where yhk = Kl*xhk*Pr(h) wjh xh3 yh3 wjg xg3 yg3 ygk = K2*xgk*Pr(g) wsh xh4 yh4 wsg xg4 yg4

K l K2 ll(xh1 + xh2 + xh3 + xh4) ll(xg1 + xg2 + xg3 + xg4)

PrCi) Pr(d j S

a priori update a priori update mjg y g l zil msg yg2 zsl mjh yhl v2 msh yh2 zs2 where d k = Kl*y-k*PrG) wjg yg3 zi3 wsg yg4 zs3 zsk = K2*y-k*Pr(s) wjh yh3 zj4 wsh yh4 zs4 where - is g or h to

K1 K2 give a priori in ll(yg1 + yhl + yg3 + yh3) ll(yg2 + yh2 + yg4 + yh4)

column to the left

g = v'l + ij3 + zsl + zs3 h = zJ2 + zj4 + zs2 + zs4

The apriori {xhl, xh2, xh3, xh4, xg l , xg2, xg3, xg4) is then replaced by the final update ( ~ 2 , zs2, zs4, z j l , z s l , d3 , zs3) and the process repeated until the convergence is obtained when g = Pr(g) and h = Pr(h).

There is an obvious similarity of this computation to that of a neural net. A general diagnostic problem is analogous to this problem. The hypotheses or diseases correspond to m and w. The {h, g, j , s} correspond to symptoms. Historical evidence can be used to choose the a priori assignment and this corresponds to the choosing weights in a neural net. The probabilities of the patient having certain symptoms is presented as input to the net and the probabilities for the various hypotheses determined. The symptom probabilities may come from measurements or other computation nets.

First iteration 0.8 h

a priori update mjh 0.1 0.16 msh 0.2 0.32 wjh 0.2 0.32 wsh 0 0

K1 2

0.2

a priori update g

mjg 0.2 0.08 msg 0 0 wjg 0 0 wsg 0.3 0.12

K2 2

596 BALDWIN

0.7 j

a priori update mjg 0.08 0.1 mjh 0.16 0.2 wjg 0 0 wjh 0.32 0.4

Kl 1.7857

h 0.8182 g 0.1818

Second iteration

0.8 h

a priori update mjh 0.2 0.1956 msh 0.2181 0.2133 wjh 0.4 0.3912 wsh 0 0

Kl 1.2223

0.3 5

a priori update msg 0 0 msh 0.32 0.2182 wsg 0.12 0.0818 wsh 0 0

K2 2.2727

0.2 g

a priori update rnjg 0.08 0.0989 msg 0 0 wjg 0 0 wsg 0.0818 0.1011

K2 6.1805

0.7 0.3 j 5

a priori update a priori update mjg 0.0989 0.1001 msg 0 0 mjh 0.1956 0.1997 msh 0.21327 0.2035 wjg 0 0 wsg 0.10111 0.0965 wjh 0.3911 0.3994 wsh 0 0

K1 K2 1.45855 3.1808

h 0.8026 g 0.1974

In this particular example accurate arithmetic should be performed. Ap- proximations can cause the iterative process to converge to a near solution corresponding to a perturbed apriori assignment from that one used here.

After several iterations the following tableau is obtained

0.8 0.2 a priori update a priori update

mjh 0.1964 0.1964 rnjg 0.1018 0.1108 msh 0.2108 0.2108 msg 0 0 wjh 0.3928 0.3928 wjg 0 0


wsh 0 0 wsg 0.0892 0.0892 K1 K2 1.2500 5.0003

0.7 j

a priori update mjg 0.1108 0.1108 mjh 0.1964 0.1964 wjg 0 0 wjh 0.3928 0.3928

K1 1.4286

h 0.8 g 0.2

0.3 5

a priori update

msh 0.2108 0.2108 wsg 0.0892 0.0892 wsh 0 0

msg 0 0

K2 3.3333

Pr(m) = 0.5180049937

We give further examples for the same a priori assignment. With Pr(h) = 0.01 and Pro’) = 0.01 we obtain

Pr(m) = 0.0197 Pr( w ) = 0.9803

which is intuitively correct since it is almost certain that an out-of-town senior is the winner and all out-of-town seniors are female.

If the evidence probabilities for El are given over the set { h, g , { h, g } } then an extra column is used in the first table. This corresponds to the situation in which a certain degree of support is given to h and a certain support to g but these supports do not have to add up to 1 . Total ignorance about h or g would correspond to a support of 0 for both h and g and a support of 1 for { h, g}. The final tableau for this example is given below. There will be an entry in the { h , g } update column for each member of the set {mjh , msh, wjh, wsh, mjg, msg, wjg, wsg} but these can be added to the update entries in the other columns. For example the entry in the g column and wsg row is 8.9979*0.0519*0.1 + 0.0519*0.1.

0.8 h a priori update

mjh 0.2137 0.2137 msh 0.2480 0.2481 wjh 0.4271 0.4271 wsh 0 0

K1 1.1250

0.1 g

a priori update mjg 0.059 18 msg 0 wjg 0 wsg 0.05 195

K2 8.9979

0.0592 0 0 0.05195 K 1

598 BALDWIN

0.7 J

a priori update mjg 0.0592 0.0592 msg mjh 0.2137 0.2137 msh

wjh 0.427 1 0.4271 wsh wjg 0 0 "Sg

K1 1.4286 h 0.8889 g 0.1111

P ' ( m ) = 0.5209

0.3 S

a priori update 0 0 0.2481 0.2481 0.0519 0.0520 0 0

K2 3.3334

It should be noted that the probability of 0.1 associated with {h, g } is given to h and g , the amounts dependent on the a priori distribution for h and g.

V. DECOMPOSITION

A. Partial Models

For the iteration to converge to a solution which retains the updating evidence it is necessary to give nonzero probabilities to each element of the complete product space of all atoms being considered. For example, suppose that the binary variables of interest are XI, . . . , Xn then an element of the a priori space is X1X2 . . . Xn where each Xi is either true or false. There are 2" possible elements and each of these must be associated with a nonzero a priori probability to ensure that the updating evidences are retained for the converged solution of the iteration method. If only a subset of this space is used as the a priori space then the method can converge to a solution which does not retain the individual evidences. This will happen when the evidences are inconsistent with what is possible with respect to the a priori assignment. We will illustrate this with a medical diagnosis type problem.

Consider the case of four possible hypotheses H I , H2, H3, H4 and four possible symptoms SI, S2, S3, and S4 linked as in the following diagram.

A priori Probabilities

0.1 0.1 0.7 0.1

Pr(Si I Hj)

s1 s2 s3 s4

599 EVIDENTIAL REASONING

The hypotheses are mutually exclusive and exhaustive. H4 corresponds to healthy. We therefore consider all the possible instantiations of HiSIS2S3S4 for i = I , 2, 3, 4. The focal elements for the a priori assignment are therefore

(hlsls2, 1~1.~2, h l s l , h l , h2s2s3, h2s3, h2s2, h2, h3s4, h3, h4)

where only the positive instantiations of the variables are recorded. If, for example S2 is true then s2 is recorded while if it is false then nothing is recorded. The a priori probabilities for these elements are determined using the formula

Pr(HiSIS2S3S4) = Pr( SIS2S3S4 I Hi)Pr(Hi)

= Pr( SI I Hi)Pr( S2 I Hi)Pr( S3 I Hi)Pr( S4 I Hi)Pr(Hi)

and are given respectively as

(0.027. 0.003, 0.063. 0.007, 0.024, 0.036, 0.016, 0.024, 0.07, 0.03, 0.7)

This is a partial model since many possible focal elements using the variables HI, H2, H3, H4, SI, S2, S3, S4 have been excluded. For example hlsls2s3s4 is excluded as is hlh2h3h4sls2s3s4. The latter we exclude because we do not allow multiple diseases and the former because there is not link from HI to S3 or S4.

The a priori assignment is updated stage by stage using first Pr(sl), then Pr(s2), then Pr(s3) and finally Pr(s4). Because we are using a partial model, it is possible to choose these probabilities, so that it is not possible to converge to a solution which preserves Pr(sl), Pr(s2), Pr(s3). The method will converge to a solution which preserves as best it can these evidence probabilities.

The symptoms SI, S2, S3, S4 should be thought of a hidden symptoms whose probabilities are obtained using a model containing these symptoms and actual measurements. The hidden symptom probabilities are thus obtained using the iteration method on this model. There may be several layers of hidden symptoms.

This type of hierarchical modeling is necessary to reduce the computational task. The derived probabilities for SI, S2, S3, and S4 should be consistent with what makes sense with respect to the model above. It is like obtaining information about the patient such that we know that the patient belongs to a certain group of patients and for this group we know that the probability of a person in the group having sl is Pr(sl).

600 BALDWIN

B. Network Example

X Y

Z

means Pr(zlx,y) = a, P r ( z ) i x y ) = b , P r ( z ( x , i y ) = c, P r ( z 1 i x l y ) = d The problem is broken down into the subproblems 1 ,2 ,3 ,4 , and 5 . Problems

1 and 2 are first solved. Problem 1 is solved using the variables { D, E, G, H } and Problem 2 using the variables { I , F } . The probabilities P'r( d ) and P'r( e ) are calculated from the solution of Problem 1 and the probability P'r(f) from


the solution of Problem 2. These are then used to solve Problems 3 and 4 using variables {B, D } for Problem 3 and variables {C, E, F } for Problem 4. The probability P’r( b ) is determined from the solution of Problem 3 and P’r( c ) from the solution of Problem 4. These are then used to solve Problem 5 using variables {A, B, C} and hence deduce P‘r( a ) .

We can denote this decomposition as:

ABC

BD CEF

DEGH FI

The a priori probabilities for the tableau of each problem is easily calculated using elementary probability theory. For example, we calculate Pr( DEGH) as follows:

Pr( DEGH) = Pr( GH I DE)Pr( D E ) = Pr( GI DE)Pr( H I DE)Pr( DE) = Pr( G I D )Pr( H I DE )Pr( E I D )Pr( D )

where

where

Pr(C(D) = Pr(CIAD)Pr(AID) + Pr(CIiAD)Pr( iAID) = Pr(CIA)Pr(AID) + Pr (CI iA)Pr ( iAID)

where

602 BALDWIN

where

where

Pr( D ) = Pr( D I B)Pr( B ) + Pr( D 1 i B ) P r ( i B )

where

The variables can be instantiated to their actual values and the calculations made. For example, Pr(degh) is obtained by putting D = d, E = e , G = g and H = h. The P r ( d i e i g h ) is obtained by putting D = d, E = i e , G = i g and H = h.

Similarly for Problem 4


Pr( ABC) = Pr( BC I A )Pr( A ) = Pr(BIA)Pr(CIA)Pr(A)


where


If Pr(A) = 0.2 then

Pr( B 1 b i b 0.22 0.78

Pr( D 1 d i d 0.22 0.78


Pr(AB) U

l a

Pr( DEGH) degh 0.0388 d i egh 0.0445 i d e g h 0.0010 i d i e g h 0.0135

Pr( CEF) cef 0.0384 i c e f 0

Pr(ABC) abc 0.084 i a b c 0

b i b 0.6364 0.0769 0.3636 0.9231

d i d I 0 0 I

d i d 0.6364 0.0769 0.3636 0.9231

d i d 0.3818 0.0461 0.6182 0.9539

d i d 0.3673 0.1323 0.6327 0.8677

d e g i h 0.0259 d i e g i h 0.0668 i d e m h 0.0093 i d i e g l h 0.054 I

c 1 c 0.12 0.88

e e l f

l e e 1 f 0.0576

0.0880

u b i c 0.056 i u b i c 0.08

f 7f 0.048 0.952

d e i g h 0.0097 d i e i g h 0.01 1 I i d e i g h 0.0093 i d i e i g h 0.1218

c l e f 0.0096 i c - t e f 0 ’

a i bc 0.036 i a i b c 0

d e i g i h 0.0065 d i e i g i h 0.0167 i d e i g i h 0.0836 i d i e i g l h 0.4873

c i e i f 0.0144 i c i e i f 0.7920

a i b i c 0.024 i a i b i c 0.72

604 BALDWIN

Pr( FI) fi f i i 1ji i f i i 0.0432 0.0048 0 0.952

Pr( BD) bd b i d i b d i b i d 0.22 0 0 0.78

The following results are obtained in several iterations:

IF P’r(g) = 0.1 P ’ r ( i g ) = 0.9 P’r({g,ig}) = 0 P’r(h) = 0.6 P ’ r ( i h ) = 0.4 P’r({h,ih}) = 0 P‘r(i) = 0.2 P ’ r ( i i ) = 0.8 P’r({i,ii}) = 0

THEN P’r(a) = 0.2755 P’r(b) = 0.1445 P’r(c) = 0.2357 P’r(d) = 0.1445 P’r(e) = 0.1635 P’rCf, = 0.2040

C. Pearl’s Cancer Example

PROBLEM 2

Metastatic Cancer

Increased Total Serum Calcium

Pr( A ): Pr(BIA): Pr(b(a) = 0.8 P r ( b ( i a ) = 0.2 Pr(CIA): Pr(cla) = 0.2 P r ( c ( 1 a ) = 0.05

Pr(a) = 0.2


Pr(DIBC): Pr(d1bc) = 0.8 P r ( d 1 i b c ) = 0.8

Pr(EIC): Pr(eIc) = 0.8 Pr(e1ic) = 0.6 Pr(d1b ic ) = 0.8 P r ( d 1 i b i c ) = 0.05

We solve this problem in two stages. First Problem 1 is solved for P’r(BC) and this is used as an input to Problem 2 to solve for P‘r(A).

Problem 1. Pr( BCDE) = Pr( E I BCD )Pr( D I BC )Pr(BC)

= Pr( El C)Pr( D I BC)Pr( BC)

where

Pr( B C ) = Pr( BC I A )Pr( A ) + Pr( BC 1 i A ) P r ( i A ) = Pr(B ( A )Pr( C I A )Pr(A ) + Pr(B I i A )Pr( C I i A ) P r ( i A )

Pr( BCDE) bcde 0.0256 i b c l d e 0.0064 i b i c d i e 0.0128 b i c d e 0.1344 i b i c l d e 0.3648 b c l d i e 0.0016 i b c d e 0.0256 b c d i e 0.0064 b i n h e 0.0224 i b i c d e 0.0192 b i c d i e 0.0896 i b c l d i e 0.0016 b c i d e 0.0064 i b c d i e 0.0064 i b i n h e 0.2432 b i c i d e 0.0336

We update this a priori distribution using first P’r(D) and then P’r(E).

Problem 2. Pr( ABC) = Pr( BC I A )Pr( A )

= Pr( B I A )Pr( C IA)Pr( A )

Pr( ABC) abc 0.032 a b i c 0.128 i a b c 0.008 i a b i c 0.152 a i b c 0.008 a i b i c 0.032 i a i b c 0.032 i a i b i c 0.608

We can update this a priori distribution using the P’r(BC) given from Problem 1 and hence obtain P’r(A).

Example 1. If we take

P’r( D):P’( d ) = 0 P‘r(E):P’r(e) = 1

the iterative assignment method gives P’r( a ) = 0.0934. The method converges in one step.

606 BALDWIN

Example 2. If we take

P‘r(D):P’(d) = 0.1 P’r(E):P’r( e ) = 0.9

the iterative assignment method gives P’r( a ) = 0.1297. Several iterations are required for convergence.

VI. PROBABILITY LOGIC

A. Logic Problem Revisited

Given

Pr(A 3 B ) = 5 Pr(A) = t

we are required to determine the Pr(B). The general assignment method gives the support pair

We will use the probability kinematics approach to solve this problem. An a priori assignment is given to the frame of discernment {ab, l a b , a i b , i a i b } and this is first updated using the evidence corresponding to Pr(A 3 B) = 8 using the iterative assignment method. This is further updated using the evidence Pr(A) = 6. The a priori assignment will focus the solution for Pr(B) to a point in the range [A, 33 given by the support pair above. We will choose an equally likely assignment for the a priori. The solution is given below.

0.6667 A priori {ab, l a b , i a i b 0.25 ab 0.2222 0.25 l a b 0.2222 0.25 a i b 0 0.25 i a i b 0.2222

K1 1.3333

0.3333 a i b update 0 0.2222 0 0.2222 0.3333 0.3333 0 0.2222 K2 4

0.8 0.2 A priori {ab, a i b } { l a b , i a i b } update 0.2222 ab 0.32 0 0.32 0.2222 T a b 0 0.1 0.1 0.3333 a i b 0.48 0 0.48 0.2222 i a i b 0 0.1 0. I


K1 K2 1.8 2.25 Pr(A) = 0.8 Pr(A 3 B ) = 0.52 Pr(B) = 0.42

Initial Tableau Note that the solution for B in the initial tableau is not contained in the

support pair interval found using the general assignment method. Also the first evidence is not retained. Iteration will correct this. After several iterations the final tableau is obtained.

A priori 0.4667 0.1 0.3333 0.1

A priori 0.4667 0.1 0.3333 0.1

ab l a b a i b i a i b

ab l a b a i b i a i b

Pr(A 3 B ) = 0.6667

0.6667 { a b , i a b , i a i b } 0.4667 0.1 0 0.1 K1 1.5

0.8 {ab, a i b } 0.4667 0 0.3333 0 Kl 1.25 Pr(A) = 0.8

0.3333 a i b 0 0 0.3333 0 K2 4

0.2 { l a b , i a i b } 0 0.1 0 0.1 K2 5 Pr(B) = 0.5667

update 0.4667 0.1 0.3333 0.1

update 0.4667 0.1 0.3333 0.1

Final Tableau The solution for Pr( B) given by the final tableau of the iterative assignment

method with an equally likely a priori assignment lies midway in the interval given by the general assignment method, [0.4667, 0.66671.

VII. RELATION TO FRIL AND SUPPORT LOGIC

A. FRIL and Support Logic Programming

The A1 language FRIL24 is based on support logic programming given by B a l d ~ i n . ” , ~ ~ The inference mechanism of support logic programming is based on the theorem of total probabilities if the conditional probabilities associated with a rule and the probabilities of the head and body are for the same sample space. In the case when the conditional probabilities of the rule are determined from one sample space and the probability of the body from some subset of this then the inference rule of FRIL is equivalent to using Jeffrey’s rule.

608 BALDWIN

B. FRIL and the Iterative Assignment Method

Consider a rule of the form

H:-a, b, c : Pr(hlabc), P r ( h ) i (abc))

and facts

where the rule is interpreted as if a and b and c then h with a probability Pr( h 1 abc) while if i ( a and b and c ) then h with a probability Pr( h I (abc)).

Support logic programming deduces that the probability of the head is given by

Pr(h) = Pr(hlabc)Pr(abc) + Pr(h((abc))Pr(i(abc))

where Pr( abc) is determined using a maximum entropy argument or independence assumption from

Pr(abc) = Pr(a)Pr(b)Pr(c)

The theory allows for support pairs to be used instead of the exact probabilities used here but this does not affect the discussion here.

This is equivalent to the solution given by the iterative assignment method when the a priori distribution over { habc, h a d i c , h a i b c , h a i b i c , h i a b c , h i a b i c , h i a i bc, h i a i b i c , i ha be, i ha b i c , i h a 1 bc, i h a i b i c , i h i a b c , i h i a b i c , i h i a i b c , i h i a i b i c } is chosen so as to maximize the entropy subject to the conditional probability constraints given above.

That is, use as the a priori over {habc, h a b i c , h a i b c , h a i b i c , h i a b c , h i a b i c , h i a i b c , h i a i b i c , i h a b c , i h a b i c , i h a i b c , i h a i b i c , i h i a b c , i h i a b i c , i h i a i b c , i h i a i b i c } the distribution {pi} such that -Zpi.ln pi is maximized subject to Pr( h I abc) and Pr( h I (abc)) values given in the above rule. This a priori is then updated using the iterative assignment method first with the evidence Pr(a) given above, second with the evidence Pr( b) given above and thirdly with the evidence Pr( c) given above.

This updating procedure for this a priori will converge in one step and gives the same solution as the use of Jeffrey’s rule where the probability of the body of the rule is computed at the product of the individual probabilities of the conjucts of the body. Thus the solution agrees with that of FRIL.

For example from the support logic programming statements


H : - a , b, c : [0.74], [0.58] a: 0.9 b: 0.8 c: 0.7

FRIL deduces

h: 0.66064

The maximum entropy distribution over { habc, h a b l c , h a i b c , h a i b i c , h i a b c , h i a b i c , h i a i b c , h i a i b i c , i h a b c , i h a b l c , i h a i b c , i h a i b i c , i h i a b c , i h i a b l c , i h i a i b c , i h i a l b l c } subject to Pr(h1abc) = 0.74 and P r ( h I i ( a b c ) ) = 0.58 is given by {0.0925,0.0725, 0.0725, 0.0725, 0.0725, 0.0725, 0.0725, 0.0725, 0.0325, 0.0525, 0.0525, 0.0525, 0.0525, 0.0525, 0.0525, 0.0525}.

The iterative assignment method using this a priori distribution and updating using the evidences El: P r ( a ) = 0.9, E2: P r ( b ) = 0.8, E3: P r ( c ) = 0.7 converges in one step to the solution (0.37296, 0.12528, 0.07308, 0.03132, 0.03248,O.O 1392,0.008 12,0.00348,0.03 104,0.09072,0.05292,0.02268,0.02352, 0.01008,0.00588,0.00252} to give Pr( h ) = 0.66064 the same as given by FRIL.

VIII. USE OF FUZZY SETS

A. Fuzzy Instantiations of Variables

Let the frame of discernment H be all combinations of instantiations of the variables XI, X2, . . . , Xn, where the variable Xi can take any value in the set of values Vi, for all i .

If the cardinality of each Vi is the same, namely m, then there will be m“ elements in the frame of discernment and this number is 1000 for three variables each having one of 10 possible values. Consider for example the variables height, weight, and degree of fatness for English men. An element in the frame of discernment corresponds to “An English man with height h, weight w , and degree g of fatness” where h , w, and g are instantiations of the height, weight, and fatness variables, respectively. The a priori probability associated with this element corresponds to the probability that a man chosen at random will have height h, weight w , and fatness g. These three variables are interdependent and even for this small dimensional problem the computational task of using the iterative assignment method is demanding. We cannot reduce the complexity of this problem by decomposition. A simplification of the computational effort can be obtained by using fuzzy sets as instantiation of the variables rather than precise values.

Instead of using the m possible instantiations for a variable X, we will use k possible fuzzy sets where each fuzzy setf is defined on the space of possible values of X by giving its membership function Mf which is the mapping

610 BALDWIN

Mp: R, + [O,l], where R , is the set of possible values of X.

Often R , will be a range of values r ( X ) . This is the case of the above example with height, weight, and fatness variables.

Consider the examaple above where we use the fuzzy sets { u.short, short, average, tall, u.tall}for the height space, { u.light, light, average, heavy, u.heavy} for the weight space and {v.thin, thin, medium, fat, u.fat} for the degree of fatness. The frame of discernment will now consist of 53 = 75 elements which is a much more manageable size.

An a priori distribution can be defined over this frame of discernment by counting the proportion of English men in a sample population that have the height corresponding to the fuzzy set of the height variable, the weight corresponding to the fuzzy set of the weight variable, and the degree of fatness corresponding to the fuzzy set of the fatness variable in the particular label of the element of the frame of discernment being allocated its a priori probability. Objects in the sample population are matched to labels in the frame of discernment by matching the values of the features of the object to the fuzzy sets of the label. Suppose, for example, that an object has a value x of the variable X and x has a membership level greater than zero in the fuzzy setsfl,f2, andf3 in the set of fuzzy sets associated with variable X. Suppose further that these membership values are M f l ( x ) , Mn(x) and M,-,(x). Suppose further that M f l ( x ) I Mn(x) I M,-,(x) then a probability assignment is imposed on subsets of the frame of discernment. The voting m ~ d e l , ~ ' * ~ * with the constant threshold assumption imposes the following probability assignment:

where 4 corresponds to a label not in the frame of discernment. 1 - M,-,(x) percentage of the voting population do not accept that x satisfies any of the labels in the frame of discernment.

This method of assigning the a priori increases the cardinality of the frame of discernment which we would like to avoid. In this example, the voting model says that Mn(x) - M f l ( x ) persons accept that the occurrence of the value x means that eitherf2 orf3 occurred. The quantity Mn(x) - M f l ( x ) can be split equally between the two fuzzy setsf2 andf3 since we have no reason to choose one as the other. This process of distributing the assignment of a disjunctive set of fuzzy sets equally among the disjuncts leads to the following assignment in the above case:


The quantity [ I - M,-,(x)] is split equally amongst those labels with nonzero membership values for x.

An an example consider the frame of discernment {short, average, tall}. A sample population contains a person who has height 5 ft. 10 in. which has memberships, say, of 0, 0.9, 0.3 for the fuzzy sets short, average, tall, respectively. The assignment for this one person would be

short: 0 average: 0.6 + 0.15 + 0.05 = 0.8 tall: 0.15 + 0.05 = 0.2

As another examaple, consider the frame of discernment { f l , f2 , f 3 , f4, f5}. The value u has membership levels 0.3, 0.4, and 0.7 with the fuzzy setsfl, f 2 , f 3 , respectively, and zero withf4 andf5. Then an object with value u will give the assignment:

f l : 01 + 0.1 = 0.2

f4: 0 f5: 0

f2: 0.05 + 0.1 + 0.1 = 0.25 f 3 : 0.3 + 0.05 + 0.1 + 0.1 = 0.55

B. Fuzzy Evidence

using evidence expressed in terms of fuzzy sets. We give two examples to illustrate the updating of a priori assignments

Example 1. Consider the frame of discernment { a , b, c, d) and the a priori assignment

{ a , b}: 0.2 { b , c}: 0.4 { c , 4: 0.1 {d): 0.3

We will update this a priori assignment using the evidence E

E: {c/0.4 + d l } : 0.7 { a } : 0.3

We interpret the fuzzy set d0.4 + d/l using the voting model which means that 60% of the voting population accept the evidence

{ d } : 0.7 { a } : 0.3

and the remaining 40% accept the evidence

{c, d } : 0.7 {a}: 0.3

Therefore for 60% of the population the following update table holds

612 BALDWIN

0.7 0.3 A priori Atoms Id) { a } Update

0.2 {a I}: 0 { a } : 0.3 0.3 0.4 { b , c ) { } : O {I : 0 0 0.1 {c, d } { d } : 0.175 { }: 0 0 0.3 Id) { d } : 0.525 { }: 0 0.7

K 2.5 5

and for 40% of the population the following table holds

0.7 0.3 A priori Atoms d ) { a } Update

0.2 { u , b ) 0 : o { a } : 0.3 0.3 0.4 { b , c } {c}: 0.35 { 1: 0 0.175 0.1 {c, d } { d } : 0.0875 { }: 0 0.2625 0.3 { d l { d } : 0.2625 { }: 0 0.2625

K 12.5 5

In this table the 0.35 associated with { c } is equally distributed between { b, c } and { c , d } .

Therefore

Pr({a, b} ) = 0.3 Pr({b, c } ) = 0.6*0 + 0.4*0.175 = 0.07 Pr({c, d } ) = 0.6*0 + 0.4*0.2625 = 0.105 Pr({d}) = 0.6*0.7 + 0.4+0.2625 = 0.525

We can combine the two tables above into one solution table as follows:

0.42 0. I8 0.28 0.12 A priori Atoms (4 { a } {c, d } { a } Update

0.2 { a , b } {I : 0 { u } : 0.18 { }: 0 { u } : 0.12 0.3 0.4 (6, c ) { } : 0 I 1: 0 {c}: 0.14 { 1: 0 0.07 0.1 {c, d } { d } : 0.105 { }: 0 {c, d } : 0.035 { }: 0 0.105 0.3 { d } { d } : 0.315 { }: 0 { d } : 0.105 1: 0 0.525

K 2.5 5 1.25 5

which in fact contracts to


0.42 0.3 0.28 A priori Atoms { d } { a } Ic, d } Update

0.2 0.4 0.1 0.3

K

I 1: 0 { a } : 0.3 { 1: 0 0.3 I 1: 0 { 1: 0 {c}: 0.14 0.07 {d}: 0.105 { }: 0 {c, d } : 0.035 0.105 Id}: 0.315 { }: 0 { d } : 0.105 0.525

2.5 5 1.25

I, we had used t..e evidence

{c/0.9 + d l } : 0.7 { a } : 0.3

instead of that used above then we would have obtained the update

Pr({a, b } ) = 0.3 Pr({b, c } ) = 0.1*0 + 0.9*0.175 = 0.01575 Pr({c, d } ) = 0.1*0 + 0.9*0.2625 = 0.23625 Pr({d}) = 0.1*0.7 + 0.9* 0.2625 = 0.3625

In fact we can generalize to any fuzzy set c/x + d/l where x E[O, 11. These results are shown in the table below.

X 0 0.1 0.2 0.3 0.4 { a , b} 0.3 0.3 0.3 0.3 0.3 {b , c) 0 0.0175 0.035 0.0525 0.07 {c. d ) 0 0.02625 0.0525 0.07875 0.105 {dl 0.7 0.65625 0.6125 0.56875 0.525

X 0.5 0.6 0.7 0.8 0.9

{a , b} 0.3 0.3 0.3 0.3 0.3 { b , c} 0.0875 0.105 0.1225 0.14 0.1575 { c , d } 0.13125 0.1575 0.18375 0.21 0.23625 Id} 0.48125 0.4375 0.39375 0.35 0.30625

X 1 { a , 6 ) 0.3 (6, c } 0.175 {c, d } 0.2625 { d l 0.2625

If the equally likely redistribution of the 0.35 associated with { c} amongst { b, c} and { c , d} is not assumed then the following table replaces the second table above

614 BALDWIN

0.7 0.3 A priori Atoms { c , dl { a ) Update

0.2 {a , b} { I : 0 { a } : 0.3 0.3 0.4 { b , c } { c } : 0.35 1: 0 [0, 0.351 0.1 { c , d} { d } : 0.0875 { }: 0 [0.0875, 0.43751 0.3 { dl {d}: 0.2625 { }: 0 0.2625

K 1.25 5

The support pairs [O, 0.351 and [0.0875, 0.43751 associated with { b, c} and { c, d} contain the probabilities associated with { b, c} and { c, d}, respectively for the 40% of the voting population. The overall probabilities are therefore

Pr({a, b} ) = 0.3 Pr({b, c } ) E 0.6*0 + 0.4*[0, 0.351 E [O, 0.141 Pr({c, d}) E 0.6*0 + 0.4*[0.0875, 0.43751 E L0.035, 0.1751 Pr({d}) = 0.6*0.7 + 0.4*0.2625 = 0.525

Example 2. Consider an a priori given over the frame of discernment { a , b, c, d},

namely

a: 0.2 b: 0.4 c: 0.1 d: 0.3

For example, a, b, c , d could be suspects of a crime. A piece of evidence might be that there is a 0.7 chance that the criminal

was tall and a 0.3 chance that the criminal was short. From the heights of each of the suspects it is estimated that the possibility of a being tall is 0.4 and of b being tall is 1 . The possibility of c being short is 0.6 and of d being short is 1 . All other possibilities are 0. Therefore the evidence can be expressed as

(~10 .4 + b / l } : 0.7 (d0 .6 + d I } : 0.3

We interpret these fuzzy sets as 40% of the voting population accept a as being tall and everyone accepts b as being tall. Similarly 60% of the population would accept c as being short and everyone would accept c as being short.

The update table is therefore

0.7 0.3 (d0.4 + b/l} {c/0.6 + d/l}

A priori 60% 40% 40% 60% Update 0.2 a 0 a: j*0.7 0 0 6*0.7*0.4 0.4 b 6 : 0.7 b: 3*0.7 0 0 0.6*0.7 + 0.4*%*0.7 0.1 c 0 0 0 d: 4*0.3 0.6*4*0.3 0.3 d 0 0 d 0.3 d $*0.3 0.4*0.4 + 0.6*8*0.3

K='$ K=+Q

EVIDENTIAL REASONING

so that

615

Pr(a) = 0.0933 Pr(b) = 0.6067 Pr(c) = 0.045 Pr(d) = 0.255

This example illustrates the use of fuzzy sets to express evidences. Almost any property which one might wish to associate with the suspects will have applicability to a given individual with a certain degree which is not necessarily 0 or 1. For example, we could use such statements as “the criminal was religious,” “the criminal was short sighted,” “the criminal was professional,” “the criminal was near to the place of the crime at the time of the crime,” “the criminal was experienced in the particular crime committed.” For all these cases each suspect will have a certain degree of satisfying the conditions of the sentence.

In a future article the general theory for the use of fuzzy sets in evidential reasoning will be given. Here we simply wish to show the importance of fuzzy sets in such applications and that the iterative assignment method is easily extended to cope with the use of fuzzy sets defined over the frame of discernment.

The above example can be solved using the following table which illustrates how the example can be generalized to cases where the a priori is given over a partition and the evidences arc expressed as fuzzy sets on this partition.

0.42 0.28 0.12 0. I8 A priori { h } { a ? { d l { d , c } Update

0.2 a 0 0.933 0 0 0.0933 0.4 h 0.42 0.1867 0 0 0.6067 0. I c 0 0 0 0.045 0.045 0.3 d 0 0 0.12 0.135 0.255

K 2.5 I .6667 3.3333 2.5

The probability associated with a column is the product of the probability of the evidence which the column belongs to times the proportion of the voting population which accepts the label of the column.

References

1. G. Shafer, A Mathematical Theory of Evidence, Princeton University Press,

2. R. Jeffrey, The Logic of Decision, McGraw-Hill, New York, 1965. 3. P. Garbolino, “Bayesian theory and artificial intelligence: The quarrelsome mar-

riage,” In Cognitive Engineering in Complex Dynamic Worlds, Ed Hollnagel et al. Academic Press, New York, 1988.

4. P. Garbolino, “A comparison of some rules for probabilistic reasoning,” In Cognitive Engineering in Complex Dynamic Worlds, Ed Hollnagel et al. Academic Press, New York, 1988.

5. P. Diaconis and S.L. Zabell, “Updating subjective probability,” Journal ofAmeri- can Statistical Association, 77, 822-830 (1982).

Princeton, NJ, 1976.

616 BALDWIN

6. Z. Domotor, “Probability kinematics and representation of belief change,” Philoso-

7. Z. Domotor, “Probability kinematics, conditionals and entropy principles,” Synth-

8. Z. Domotor, Zannoti, and Graves, “Probability kinematics,” Synthese, 44,421-442

9. G. Shafer, “Constructive probability,” Synthese, 48, 1-60 (1981).

phy of Science, 47, 384-403 (1980).

ese, 63, 75-114 (1985).

(1980).

10. G. Shafer, “Jeffrey’s rule of conditioning,” Philosophy of Science, 48, 337-362

11. Van Fraassen, “Rational belief and probability kinematics,” Philosophy of Science,

12. R.M. Williams, “Bayesian conditionalisation and the principle of minimum information,” British Journal for the Philosophy of Science, 31, 131-144 (1980).

13. G. Shafer and Tversky, “Language and designs for probabilityjudgment,” Cognitive Science, 9, 309-339 (1985).

14. D. DuBois and H. Prade, “Inference in possibilistic hypergraphs,” 3rd Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Sys- tem, Paris, 1990.

15. L. Zadeh, A book review of “A mathematical theory of evidence,” A1 Magazine,

16. L. Zadeh, “A simple view of the Dempster-Shafer theory of evidence,” Berkeley

17. H. Field, “A note on Jeffrey conditionalisation,” Philosophy of Science, 44,361-367

18. H. Ichihashi and H. Tanaka, “Jeffrey-like rules of conditioning for the Dernpster

19. D. Dubois and H. Prade, “On the unicity of Dempster Rule of Combination,” Int.

20. R. Yager, “Toward general theory of reasoning with uncertainty. 1: Nonspecificity

21. P. Smets, “Belief functions” In Non-standard Logics for Automated Reasoning, P.

22. L. Zadeh, “Fuzzy sets,” Information and Control, 8, 338-353 (1965). 23. J. Pearl “Probabilistic reasoning in Intelligent Systems,” Morgan Kaufmann Pub.

24. J.F. Baldwin et al., FRIL Manual, Fnl Systems Ltd, St. Anne’s House, Bristol,

25. J.F. Baldwin, “Evidential support logic programming,” Fuzzy Sets and Systems,

(1981).

47, 165-187 (1980).

5(3), 81-83 (1984).

Cognitive Science Report No. 27, University of California, 1984.

(1978).

theory of evidence,” Int. J . Approximate Reasoning, 3(2), 143-156 (1989).

J . of Int. Sys., 1(2), 133-142 (1982).

and fuzziness,” Int. J . of Intelligent Systems, 1( l) , 45-67 (1986).

Smets et al. (Eds.), Academic Press, New York, 1988.

Co., Los Altos, CA, 1988.

UK.

24, 1-26 (1987). 26. J.F. Baldwin, “Support logic programming,” Int. J . Intelligent Systems, 1, 73-104

1986). 27. J.F. Baldwin, “Computational models of uncertainty in expert systems,” Computers

28. J.F. Baldwin, Combining Evidences for EvidentialReasoning, ITRC Report, Univer- & Mathematics with Applications, to appear.

sity of Bristol, Presented at WARES Program, Blanes 1989.

combining evidences for evidential reasoning

Documents