statistical reasoning - computer science and engineeringcs465/slides/ai8.pdf · statistical...

13
Chapter 8 1 Statistical Reasoning Probability and Bayes' Theorem Certainty Factors and Rule- Based Systems Bayesian Networks Dempster-Shafer Theory Fuzzy Logic

Upload: lamnguyet

Post on 05-May-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

Chapter 8 1

Statistical Reasoning

• Probability and Bayes' Theorem

• Certainty Factors and Rule-Based Systems

• Bayesian Networks

• Dempster-Shafer Theory

• Fuzzy Logic

Chapter 8 2

Probability and Bayes' Theorem

P(H i|E) = the probability that hypothesis Hi is true given evidence E P(E|Hi) = the probability that we will observe evidence E given that hypothesis Hi is true

P(Hi) = the a priori probability that hypothesis Hi is true in the absence of any specific evidence. These probabilities are called prior probabilities or prios

k = the number of possible hypotheses

Bayes's theorem then state that

P(Hi|E) = P(E|Hi) . P(Hi) k

ΣP(E|Hn).P(Hn) n=1

Chapter 8 3

In real life problem we have several evidence that are not independent

Example S:patient has spots M:patient has measles F:patient has high fever Spots & Fever are not independent events and hence we cannot just sum their effects. There is a need to represent explicitely the conditional probability that arises from their conjunction In general, given a prior body of evidence e and some new observation E, we need to compute P(H|E,e) = P(H|E). P(e|E,H) P(e|E) The size of the set of joint probabilities required to compute this function grows as 2n if there are n different propositions being considered.

Chapter 8 4

Bayes's theorem is intractable for several reasons:

• The knowledge acquisition problem is insurmountable

• The space that would be required to store all the probabilities is too large

• The time required to compute the probabilities is too large

Chapter 8 5

Certainty Factors and Rule-Based Systems Practical way of compromising on pure Bayesian system was pioneered in the MYCIN system Example If: (1) the stain of the organism is gram-positive, and (2) the morphology of the organism is cocus, and (3) the growth of conformation of the organism

is clumps, then there is a suggestive evidence (0.7) that the identity of the organism is staphylococcus

Chapter 8 6

Basic Definitions • MB[h,e] - a measure (between 0 and 1) of belief in hypothesis h given the evidence e

• MD[h,e] - a meaure (between 0 and 1) of disbelief in hypothesis h given the evidence e.

• CF[h,e] - is the certainty factor and

is defined as • CF[h,e] = MB[h,e] - MD[h,e]

• Since any particular piece of

evidence either supports or denies a hypothesis, a single number suffices to define both MB and MD and thus the CF

Chapter 8 7

Combination Of Multiple Pieces Of Evidence

A C

B

MB[h,s1 s2] = 0 if MD[h,s1 s2]=1 = MB[h,s1] +MB[h,s2].(1-MB[h,s1]) otherwise

MD[h,s1 s2] = 0 if MB[h,s1 s2]=1

MD[h,s1] +MD[h,s2].(1-MD[h,s1]) otherwise

Chapter 8 8

Combination Of Multiple Pieces Of Evidence (continue)

A and B MB[h1 and h2,e] = min(MB[h1,e],MB[h2,e])

A or B MB[h1 or h2,e] = max(MB[h1,e], MB[h2,e])

A B C MB[h,s] = MB/[h,s]. max(0, CF[s,e])

Chapter 8 9

Advantages • The approach makes strong

independence assumptions that make it relatively easy to use

• The approach can serve as the basis of practical application programs

• It appears to mimic the way people manipulate certainities

Disadvantages

• The assumption of independency creates danger if rules are not written carefully

• No solid theoretical basis

Chapter 8 10

Dempster-Shafer Theory • This approach considers sets of

propositions and assign to each of them an interval :

[Belief, Plausibility]

• Belief (Bel) measures the strengthof the evidence in favor of a set of propositions. It ranges from 0 (no evidence) to 1 (certainty)

• Plausibility(Pl) measure the extent to

which evidence in favor of - s leaves room for belief in s . It also ranges from 0 to 1 and is defined as:

Pl(s) = 1 - Bel(- s)

•Θ is an exhaustive universe of mutually exclusive hypotheses (frame of discrement)

Chapter 8 11

• m(p) measure the amount of belief that is currently assigned to exactly the set p of hypotheses.

• If Θ contains n element then there are 2n subsets of Θ. We must assign m so that the sum of all the m values assigned to the subset of Θ is 1

• Although dealing with 2n values may appear intractable, it usually turns out that many of the subsets will never need to be considered. Suppose we are given two belief functions m1 and m2 . Let X be the set of subsets of Θ to which m1 assigns a nonzero value and let Y be the corresponding set of m2. We define the combination m3 of m1 and m2 to be : ΣX Λ Y = Z

m1(X). m2(Y)

m3 (Z) = -----------------------

1- ΣX Λ Y = φ m1(X). m2(Y)

Chapter 8 12

Example Asume that Θ = {A,F,C,P} where A: allergy, F: flu, C: cold, P: pneumonia Our measure of belief before observing any sypmtom is: m(Θ) = 1.0 suppose that m1 corresponds to our belief after observing fever: m1 ({F,C,P}) = 0.6 m1 (Θ) = 0.4 suppose that m2 corresponds to our belief after observing fever: m2 ({A,F,C}) = 0.8 m2 (Θ) = 0.2

Computing the combination m3

m2({A,F,C}) 0.8 m2 (Θ) 0.2

m1({F,C,P}) 0.6 m3({F,C}) 0.48 m3({F,C,P}) 0.12

m1(Θ) 0.4 m3{A,F,C} 0.32 Θ 0.08

Chapter 8 13

Now let m4 corresponds to our belief given just the evidence that the problem goes away when the patient goes on a trip

m4 ({A}) = 0.9 m4 (Θ) = 0.1

m4({A}) 0.9 m4(Θ) 0.1 m3({F,C}) 0.48 Φ 0.432 {F,C} 0.048 m3({A,F,C}) 0.32 ({A}) 0.288 {A,F,C} 0.032 m3({F,C,B}) 0.12 Φ 0.108 {F,C,B} 0.012 m3(Θ) 0.08 ({A}) 0.072 Θ 0.008

But there is now a total belief of 0.54 associated with Φ; only 0.46 is associated with outcomes that are in fact possible. So we need to scale the remaining values by the facor 1 - 0.54 = 0.46 . Then m5 is m5({F,C}) 0.104 m5({A,F,C}) 0.070 m5({F,C,B}) 0.026 m5({A}) 0.783 m5(Θ) 0.017