1 abduction, uncertainty, and probabilistic reasoning chapters 13, 14, and more

51
1 Abduction, Abduction, Uncertainty, and Uncertainty, and Probabilistic Probabilistic Reasoning Reasoning Chapters 13, 14, and more

Upload: thomasina-russell

Post on 31-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

1

Abduction, Uncertainty, Abduction, Uncertainty, and and

Probabilistic ReasoningProbabilistic ReasoningChapters 13, 14, and more

Page 2: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

2

Introduction

• Abduction is a reasoning process that tries to form plausible explanations for abnormal observations– Abduction is distinct different from deduction and induction– Abduction is inherently uncertain

• Uncertainty becomes an important issue in AI research• Some major formalisms for representing and reasoning about

uncertainty– Mycin’s certainty factor (an early representative)– Probability theory (esp. Bayesian networks)– Dempster-Shafer theory– Fuzzy logic– Truth maintenance systems

Page 3: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

3

Abduction

• Definition (Encyclopedia Britannica): reasoning that derives an explanatory hypothesis from a given set of facts– The inference result is a hypothesis, which if true, could

explain the occurrence of the given facts• Examples

– Dendral, an expert system to construct 3D structures of chemical compounds • Fact: mass spectrometer data of the compound and the

chemical formula of the compound• KB: chemistry, esp. strength of different types of bounds• Reasoning: form a hypothetical 3D structure which meet the

given chemical formula, and would most likely produce the given mass spectrum if subjected to electron beam bombardment

Page 4: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

4

– Medical diagnosis• Facts: symptoms, lab test results, and other observed findings

(called manifestations)

• KB: causal associations between diseases and manifestations

• Reasoning: one or more diseases whose presence would causally explain the occurrence of the given manifestations

– Many other reasoning processes (e.g., word sense disambiguation in natural language process, image understanding, detective’s work, etc.) can also been seen as abductive reasoning.

Page 5: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

5

Comparing abduction, deduction and induction

Deduction: major premise: All balls in the box are black minor premise: These balls are from the box conclusion: These balls are blackAbduction: rule: All balls in the box are black observation: These balls are black explanation: These balls are from the boxInduction: case: These balls are from the box observation: These balls are black hypothesized rule: All ball in the box are black

A => B A ---------BA => B B-------------Possibly A

Whenever A then B but not vice versa-------------Possibly A => BInduction: from specific cases to general rules

Abduction and deduction: both from part of a specific case to other part of

the case using general rules (in different ways)

Page 6: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

6

Characteristics of abduction reasoning

1. Reasoning results are hypotheses, not theorems (may be false even if rules and facts are true),

– e.g., misdiagnosis in medicine2. There may be multiple plausible hypotheses

– When given rules A => B and C => B, and fact B both A and C are plausible hypotheses

– Abduction is inherently uncertain– Hypotheses can be ranked by their plausibility if that can be

determined 3. Reasoning is often a Hypothesize- and-test cycle

– hypothesize phase: postulate possible hypotheses, each of which could explain the given facts (or explain most of the important facts)

– test phase: test the plausibility of all or some of these hypotheses

Page 7: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

7

– One way to test a hypothesis H is to query if something that is currently unknown but can be predicted from H is actually true.• If we also know A => D and C => E, then ask if D and E are

true.• If it turns out D is true and E is false, then hypothesis A

becomes more plausible (support for A increased, support for C decreased)

• Alternative hypotheses compete with each other (Okam’s razor, explain away)

4. Reasoning is non-monotonic – Plausibility of hypotheses can increase/decrease as new

facts are collected (deductive inference determines if a sentence is true but would never change its truth value)

– Some hypotheses may be discarded/defeated, and new ones may be formed when new observations are made

Page 8: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

8

Source of Uncertainty

• Uncertain data (noise or partial observation)• Uncertain knowledge (e.g, causal relations)

– A disorder may cause any and all POSSIBLE manifestations in a specific case

– A manifestation can be caused by more than one POSSIBLE disorders

• Uncertain reasoning results– Abduction and induction are inherently uncertain– Default reasoning, even in deductive fashion, is uncertain– Incomplete deductive inference may be uncertain

Page 9: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

9

Probabilistic Inference

• Based on probability theory (especially Bayes’ theorem)– Well established discipline about uncertain outcomes– Empirical science like physics/chemistry, can be verified by

experiments• Probability theory is too rigid to apply directly in many

knowledge-based applications– Some assumptions have to be made to simplify the reality– Different formalisms have been developed in which some aspects

of the probability theory are changed/modified.• We will briefly review the basics of probability theory before

discussing different approaches to uncertainty

• The presentation uses diagnostic process (an abductive and evidential reasoning process) as an example

Page 10: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

10

Probability of Events

• Sample space and events– Sample space S: (e.g., all people in an area)– Events E1 S: (e.g., all people having cough)

E2 S: (e.g., all people having cold)• Prior (marginal) probabilities of events

– P(E) = |E| / |S| (frequency interpretation)– P(E) = 0.1 (subjective probability)– 0 <= P(E) <= 1 for all events – Two special events: and S: P() = 0 and P(S) = 1.0

• Boolean operators between events (to form compound events)– Conjunctive (intersection): E1 ^ E2 ( E1 E2)– Disjunctive (union): E1 v E2 ( E1 E2) – Negation (complement): ~E (EC = S – E)

Page 11: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

11

• Probabilities of compound events– P(~E) = 1 – P(E) because P(~E) + P(E) =1– P(E1 v E2) = P(E1) + P(E2) – P(E1 ^ E2)– But how to compute the joint probability P(E1 ^ E2)?

• Conditional probability (of E1, given E2)– How likely E1 occurs in the subspace of E2

E

~E

E2E1

E1 ^ E2

)2(

)21(

||/|2|

||/|21|

|2|

|21|)2|1(

EP

EEP

SE

SEE

E

EEEEP

)2()2|1()21( EPEEPEEP

Page 12: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

12

• Independence assumption– Two events E1 and E2 are said to be independent of each other if

(given E2 does not change the likelihood of E1)

– Computation can be simplified with independent events

• Mutually exclusive (ME) and exhaustive (EXH) set of events– ME:

– EXH:

)1()2|1( EPEEP

)2()1()2()2|1()21( EPEPEPEEPEEP

))2(1)(1(1(1 )2()1()2()1(

)21()2()1()21(

EPEPEPEPEPEP

EEPEPEPEEP

jinjiEEPEE jiji ,,..,1, ),0)((

)1)...(( ... 11 nn EEPSEE

Page 13: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

13

Bayes’ Theorem

• In the setting of diagnostic/evidential reasoning

– Know prior probabilities of hypotheses

conditional probabilities – Want to compute the posterior probability– The hypothesis with the greatest posterior probability may be

taken as the most plausible diagnosis, because it is the most

probable cause of the given manifestations

)( iHP

)|( ij HEP

)|( ij HEP

)|( ji EHP

)( iHP

evidence/manifestations

hypotheses

1 mj

i

EEE

H

Page 14: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

14

Bayes’ Theorem

• Computation is called Bayesian reasoning– From priors and conditionals to posteriors

• Bayes’ theorem (formula 1):

• If the purpose is to find which of the n hypotheses

is more plausible for the given , then we can ignore the

denominator and rank them use relative likelihood

)(/)|()()|( jijiji EPHEPHPEHP

nHH ,...,1

)()|()|( iijji HPHEPEHrel

jE

Page 15: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

15

• can be computed from and , if we assume all hypotheses are ME and EXH

• Then we have another version of Bayes’ theorem:

where , the sum of relative likelihood of all n

hypotheses, equals , and is a normalization factor

nHH ,...,1

n

iiij

n

iij

njj

HPHEP

HEP

HHEPEP

1

1

1

)()|(

ME) (by )(

EXH) (by ) )...(()(

n

kjk

ji

n

kkkj

iijji

EHrel

EHrel

HPHEP

HPHEPEHP

11

)|(

)|(

)()|(

)()|()|(

n

kkkj HPHEP

1

)()|(

)( jEP )|( ij HEP )( iHP

)( jEP

Page 16: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

16

Probabilistic Inference for simple diagnostic problems

• Knowledge base:

• Case input: • Find the hypothesis with the highest posterior

probability

onanifestatievidence/m :,...,1 mEE/disordershypotheses :,...,1 nHH

set EXH & ME a form hypothesesbinary are and ij HE

iesprobabilit lconditiona,...1,,...1 ),|( m jniHEP ij

iHlEE ,...,1

),...,|( 1 li EEHP

iesprobabilitprior ,...1 ),( n iHP i

Page 17: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

17

• By Bayes’ theorem

• How to deal with multiple evidences?– Assume all pieces of evidence are conditionally independent,

given any hypothesis

– We then have

– How to deal with

),...(

)()|,...(),...,|(

1

11

l

iilli EEP

HPHEEPEEHP

)|()|,...( 11 ijljil HEPHEEP

),...(

)()|(),...,|(

1

11

l

iijlj

li EEP

HPHEPEEHP

),...( 1 lEEP

Page 18: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

18

• The relative likelihood

• The absolute posterior probability

• Evidence accumulation (when new evidence discovered)

n

kkj

ljk

ijlji

n

klk

lili

HEPHP

HEPHP

EEHrel

EEHrelEEHP

11

1

11

11

)|()(

)|()(

),...,|(

),...,|(),...,|(

)|()()()|,...,(),...,|( 111 ijljiiilli HEPHPHPHEEPEEHrel

),...,|()|(),,...,|( 1111 liillli EEHrelHEPEEEHrel

),...,|())|(1()~,,...,|( 1111 liillli EEHrelHEPEEEHrel

Page 19: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

19

Assessment of Assumptions

• Assumption 1: hypotheses are mutually exclusive and exhaustive – Single fault assumption (one and only hypothesis must true)– Multi-faults do exist in individual cases– Can be viewed as an approximation of situations where

hypotheses are independent of each other and their prior probabilities are very small

• Assumption 2: pieces of evidence are conditionally independent of each other, given any hypothesis– Manifestations themselves are not independent of each other, they

are correlated by their common causes– Reasonable under single fault assumption– Not so when multi-faults are to be considered

small very are )( and )(both if 0)()()( 212121 HPHPHPHPHHP

Page 20: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

20

Limitations of the simple Bayesian system

• Cannot handle well hypotheses of multiple disorders– Suppose are independent of each other– Consider a composite hypothesis– How to compute the posterior probability (or relative likelihood)

– Using Bayes’ theorem

),...(

)^()^|,...(),...,|^(

1

21211121

l

ll EEP

HHPHHEEPEEHHP

? ),...,|^( 121 lEEHHP

tindependen are they because )()()^( 2121 HPHPHHP

21

211211

^given t,independen are assuming )^|()^|,...(

HHEHHEPHHEEP

j

jljl

?)^|( compute toHow 21 HHEP j

nHH ,...,1

21^ HH

Page 21: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

21

but this is a very unreasonable assumption

• Cannot handle causal chaining– Ex. A: weather of the year

B: cotton production of the year C: cotton price of next year

– Observed: A influences C– The influence is not direct (A -> B -> C)

P(C|B, A) = P(C|B): instantiation of B blocks influence of A on C• Need a better representation and better assumptions

),...,|( ),...,|( ),...,|^( 1211121 lll EEHPEEHPEEHHP

? ,...,given t,independen are ,..., Assuming 11 ln EEHH

E: earth quake B: burglar

A: alarm set off

E and B are independentBut when A is given, they are (adversely) dependent because they become competitors to explain A

P(B|A, E) <<P(B|A)

Page 22: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

22

Bayesian Networks (BNs)

• Definition: BN = (DAG, CPD) – DAG: directed acyclic graph (BN’s structure)

• Nodes: random variables (typically binary or discrete, but methods also exist to handle continuous variables)

• Arcs: indicate probabilistic dependencies between nodes (lack of arc signifies conditional independence)

– CPD: conditional probability distribution (BN’s parameters)• Conditional probabilities at each node, usually stored as a

table (conditional probability table, or CPT)

– Root nodes are a special case – no parents, so just use priors in CPD:

iiii xxP of nodesparent all ofset theis where)|(

)()|( so , iiii xPxP

Page 23: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

23

Example BN

P(c|a) = 0.2 P(c|a) = 0.005

A

B C

D E

P(b|a) = 0.3 P(b|a) = 0.001

P(a) = 0.001

P(d|b,c) = 0.1 P(d|b,c) = 0.01P(d|b,c) = 0.01 P(d|b,c) = 0.00001

P(e|c) = 0.4 P(e|c) = 0.002

Note that we only specify P(a) etc., not P(¬a), since they have to add to one

Uppercase: variables (A, B, …)Lowercase: values/states of variables (A has two states a and a)

Page 24: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

24

• Conditional independence assumption–

where q is any set of variables (nodes)other than and its descendents

– blocks influence of other nodes on and its descendents (q influences onlythrough variables in )

– With this assumption, the complete joint probability distribution of all variables in the network can be represented by (recovered from) local CPDs by chaining these CPDs:

ix

)|(),...,( 11 iinin xPxxP

)|(),|( iiii xPqxP

ix i ix

i

q

ix

i

Conditional independence and chaining

Page 25: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

25

Chaining: Example

Computing the joint probability for all variables is easy:

The joint distribution of all variables

P(A, B, C, D, E) = P(E | A, B, C, D) P(A, B, C, D) by Bayes’ theorem= P(E | C) P(A, B, C, D) by cond. indep. assumption= P(E | C) P(D | A, B, C) P(A, B, C) = P(E | C) P(D | B, C) P(C | A, B) P(A, B)= P(E | C) P(D | B, C) P(C | A) P(B | A) P(A)

A

B C

D E

Page 26: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

26

Topological semantics• A node is conditionally independent of its non-

descendants given its parents• A node is conditionally independent of all other nodes in

the network given its parents, children, and children’s parents (also known as its Markov blanket)

• The method called d-separation can be applied to decide whether a set of nodes X is independent of another set Y, given a third set Z

A

B

CA

B CA

B C

Chain: A and C are independent, given B

Diverging: B and C are independent, given A

Converging: B and C are independent, NOT given A

Page 27: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

27

Inference tasks

• Simple queries: Computer posterior marginal P(Xi | E=e)– E.g., P(NoGas | Gauge=empty, Lights=on, Starts=false)

– Posteriors for ALL nonevidence nodes (belief update)

– Priors for and/all nodes (E = )

• Conjunctive queries: – P(Xi, Xj | E=e) = P(Xi | E=e) P(Xj | Xi, E=e)

• Optimal decisions: Decision networks or influence diagrams include utility information and actions; – Maximize expected utility:

U(outcome)P(outcome | action, evidence)

– Probabilistic inference is required to find

P(outcome | action, evidence)

Page 28: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

28

• MAP problems (explanation)–

– The solution provides a good explanation for your action

– This is an optimization problem

))|((max

i.e., ,given , ofion instantiat probablemost thefind tois problem y)probabilit iaposterior (maximum MAP Then the varialbes.

edinstantiat-un all ofset the , variablesedinstantiat ofset the BN, ain variablesall ofset thedenote Let

VUP

VU

VXUXVX

u

Page 29: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

29

Approaches to inference• Exact inference

– Enumeration– Variable elimination– Belief propagation in polytrees (singly connected BNs)– Clustering / join tree algorithms

• Approximate inference– Stochastic simulation / sampling methods– Markov chain Monte Carlo methods– Loopy propagation– Mean field theory– Simulated annealing– Genetic algorithms– Neural networks

Page 30: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

30

Inference by enumeration

• Instead of computing the joint, suppose we just want the probability for one variable

• Add all of the terms (atomic event probabilities) from the full joint distribution

• If E are the evidence (observed) variables and Y are the other (unobserved) variables, excluding X, then the posterior distribution

P(X|E=e) = α P(X, e) = α ∑yP(X, e, Y)• Sum is over all possible instantiations of variables in Y• α is the normalization factor

• Each P(X, e, Y) term can be computed using the chain rule• Computationally expensive!

Page 31: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

31

Example: Enumeration• Suppose we want P(d), and only the value of E is given as true

• P(d|e) = ΣABCP(A, B, C, d, e)

= ΣABCP(A) P(B|A) P(C|A) P(d|B,C) P(e|C)

= (P(a)P(b|a)P(c|a)P(d|b,c)P(e|c)+ P(a)P(b|a)P(~c|a)P(d|b,~c)P(e|~c)

+ P(a)P(~b|a)P(c|a)P(d|~b,c)P(e|c)+ P(a)P(~b|a)P(~c|a)P(d|~b,~c)P(e|~c)

+ P(~a)P(b|~a)P(c|~a)P(d|b,c)P(e|c)+ P(~a)P(b|~a)P(~c|~a)P(d|b,~c)P(e|~c)

+ P(~a)P(~b|~a)P(c|~a)P(d|~b,c)P(e|c)+ P(~a)P(~b|~a)P(~c|~a)P(d|~b,~c)P(e|

~c)

P(~d|e) = ΣABCP(A, B, C, ~d, e)

= P(d|e) + P(~d|e)

• With simple iteration to compute this expression, there’s going to be a lot of repetition (e.g., P(e|c) has to be recomputed every time we iterate over C for all possible assignments of A and B))

A

B C

D E

Page 32: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

32

• Singly connected network, (also known as polytree) – there is at most one undirected path between any two nodes

(i.e., the network is a tree if the direction of arcs are ignored)– The influence of the instantiated variable (evidence) spreads

to the rest of the network along the arcs

A

B C

D E = e F

Belief Propagation

• The instantiated variable influences its predecessors and successors differently (using CPT along opposite directions)

• Computation is linear to the diameter ofthe network (the longest undirected path)

• Update belief (posterior) of every non-evidence node in one pass

– For multi-connected net: conditioning

Page 33: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

33

Conditioning

• Conditioning: Find the network’s smallest cutset S (a set of nodes whose removal renders the network singly connected)

– In this network, S = {A} or {B} or {C} or {D}

• For each instantiation of S, compute the belief update with the belief propagation algorithm

• Combine the results from all instantiations of S (each is weighted by P(S = s))

• Computationally expensive (finding the smallest cutset is in general NP-hard, and the total number of possible instantiations of S is O(2 |

S|))

A

B C

D E

Page 34: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

34

Junction Tree

• Convert a BN to a junction tree– Moralization: add undirected edge between every pair of

parents, then drop directions of all arc: Moralized Graph

– Triangulation: add an edge to any cycle of length > 3: Triangulated Graph

– A junction tree is a tree of cliques of the triangulated graph

– Cliques are connected by links• A link stands for the set of all variables S shared by these

two cliques

• Each clique has a CPT, constructed from CPT of variables in the original BN

Page 35: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

35

Junction Tree

• Reasoning– Since it is now a tree, polytree algorithm can be applied,

but now two cliques exchange P(S), the distribution of S

– Complexity: • O(n) steps, where n is the number of cliques

• Each step is expensive if cliques are large (CPT exponential to clique size)

• Construction of CPT of JT is expensive as well, but it needs to compute only once.

Page 36: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

36

Approximate inference: Direct sampling

• Suppose you are given values for some subset of the variables, E, and want to infer values for unknown variables, Z

• Randomly generate a very large number of instantiations from the BN– Generate instantiations for all variables – start at root variables and

work your way “forward” in topological order

• Rejection sampling: Only keep those instantiations that are consistent with the values for E

• Use the frequency of values for Z to get estimated probabilities

• Accuracy of the results depends on the size of the sample (asymptotically approaches exact results)

• Very expensive

Page 37: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

37

Markov chain Monte Carlo algorithm• So called because

– Markov chain – each instance generated in the sample is dependent on the previous instance

– Monte Carlo – statistical sampling method

• Perform a random walk through variable assignment space, collecting statistics as you go– Start with a random instantiation, consistent with evidence variables

– At each step, for some nonevidence variable x, randomly sample its value by

• Given enough samples, MCMC gives an accurate estimate of the true distribution of values (no need for importance sampling because of Markov blanket)

)(|(Π))(|())(|()(

YparentsyPxparentxPxmbxPXchildY

Page 38: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

38

Loopy Propagation

• Belief propagation– Works only for polytrees (exact solution)

– Each evidence propagates once throughout the network

• Loopy propagation– Let propagate continue until the network stabilize (hope)

• Experiments show– Many BN stabilize with loopy propagation

– If it stabilizes, often yielding exact or very good approximate solutions

• Analysis– Conditions for convergence and quality approximation are

under intense investigation

Page 39: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

39

Learning BN (from case data)

• Need for learning– Experts’ opinions are often biased, inaccurate, and incomplete– Large databases of cases become available

• What to learn– Parameter learning: learning CPT when DAG is known (easy)– Structural learning: learning DAG (hard)

• Difficulties in learning DAG from case data– There are too many possible DAG when # of variables is large

(more than exponential) n # of possible DAG 3 25 10 4*10^18

– Missing values in database– Noisy data

Page 40: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

40

• Bayesian approach (Cooper)– Find the most probable DAG, given database DB, i.e.,

max(P(DAG|DB)) or max(P(DAG, DB))– Based on some assumptions, a formula is developed to

compute P(DAG, DB) for a given pair of DAG and DB– A hill-climbing algorithm (K2) is developed to search a

(sub)optimal DAG– Extensions to handle some form of missing values

BN Learning Approaches

Page 41: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

41

• Minimum description length (MDL) (Lam, etc.)– Sacrifices accuracy for simpler (less dense) structure

• Case data not always accurate• Fewer links imply smaller CPD tables and less expensive

inference– L = L1 + L2 where

• L1: the length of the encoding of DAG (smaller for simpler DAG)

• L2: the length of the encoding of the difference between DAG and DB (smaller for better match of DAG with DB)

• Smaller L2 implies more accurate (and more complex) DAG, and thus larger L1

– Find DAG by heuristic best-first search, that Minimizes L

BN Learning Approaches

Page 42: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

42

• Ordinary set theory

– There are sets that are described by vague linguistic terms (sets without hard, clearly defined boundaries), e.g., tall-person, fast-car• Continuous• Subjective (context dependent)• Hard to define a clear-cut 0/1 membership function

otherwise 0

if 1)( Predicate

Ax

xA

) (y probabilit use , ifuncertain isit When AxPAx

Axf

Axxf

A

A

set offunction membershipor sticcharacteri thecalled is )(

otherwise 0

if 1)(

Other formalisms for Uncertainty Fuzzy sets and fuzzy logic

Page 43: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

43

• Fuzzy set theory–

height(john) = 6’5” Tall(john) = 0.9height(harry) = 5’8” Tall(harry) = 0.5height(joe) = 5’1” Tall(joe) = 0.1

– Examples of membership functions

Axxf A

set tobelong o thought tis degree for the stands1] [0, continuous to1} {0,binary from )(Relax

Set of teenagers

0 12 19

1 -

Set of young people

0 12 19

1 -

Set of mid-age people

20 35 50 65 80

1 -

Page 44: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

44

• Fuzzy logic: many-value logic– Fuzzy predicates (degree of truth)– Connectors/Operators

• Compare with probability theory– Prob. Uncertainty of outcome,

• Based on large # of repetitions or instances• For each experiment (instance), the outcome is either true or false

(without uncertainty or ambiguity)unsure before it happens but sure after it happens

Fuzzy: vagueness of conceptual/linguistic characteristics• Unsure even after it happens

whether a child of tall mother and short father is tall unsure before the child is bornunsure after grown up (height = 5’6”)

y )( ify )( xfxF AA

)}(, )(max{ )( )( :ndisjunctio )}(, )(min{ )( )( :nconjunctio

)(1 )( :negation

xFxFxFxFxFxFxFxF

xFx~F

BABA

BABA

AA

Page 45: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

45

– Empirical vs subjective (testable vs agreeable)– Fuzzy set operations may lead to unreasonable results

• Consider two events A and B with P(A) < P(B)• If A => B (or A B) then

P(A ^ B) = P(A) = min{P(A), P(B)} P(A v B) = P(B) = max{P(A), P(B)}

• Not the case in general P(A ^ B) = P(A)P(B|A) P(A) P(A v B) = P(A) + P(B) – P(A ^ B) P(B)

(equality holds only if P(B|A) = 1, i.e., A => B)– Something prob. theory cannot represent

• Tall(john) = 0.9, ~Tall(john) = 0.1Tall(john) ^ ~Tall(john) = min{0.1, 0.9) = 0.1john’s degree of membership in the fuzzy set of “median-height people” (both Tall and not-Tall)

• In prob. theory: P(john Tall ^ john Tall) = 0

Page 46: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

46

Uncertainty in rule-based systems

• Elements in Working Memory (WM) may be uncertain because– Case input (initial elements in WM) may be uncertain

Ex: the CD-Drive does not work 70% of the time– Decision from a rule application may be uncertain even if the

rule’s conditions are met by WM with certaintyEx: flu => sore throat with high probability

• Combining symbolic rules with numeric uncertainty: Mycin’s Uncertainty Factor (CF)– An early attempt to incorporate uncertainty into KB systems– CF [-1, 1]– Each element in WM is associated with a CF: certainty of that

assertion– Each rule C1,...,Cn => Conclusion is associated with a CF:

certainty of the association (between C1,...Cn and Conclusion).

Page 47: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

47

– CF propagation: • Within a rule: each Ci has CFi, then the certainty of Action is

min{CF1,...CFn} * CF-of-the-rule• When more than one rules can apply to the current WM for the

same Conclusion with different CFs, the largest of these CFs will be assigned as the CF for Conclusion

• Similar to fuzzy rule for conjunctions and disjunctions– Good things of Mycin’s CF method

• Easy to use• CF operations are reasonable in many applications• Probably the only method for uncertainty used in real-world

rule-base systems– Limitations

• It is in essence an ad hoc method (it can be viewed as a probabilistic inference system with some strong, sometimes unreasonable assumptions)

• May produce counter-intuitive results.

Page 48: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

48

Dempster-Shafer theory

• A variation of Bayes’ theorem to represent ignorance• Uncertainty and ignorance

– Suppose two events A and B are ME and EXH, given an evidence EA: having cancer B: not having cancer E: smoking

– By Bayes’ theorem: our beliefs on A and B, given E, are measured by P(A|E) and P(B|E), and P(A|E) + P(B|E) = 1

– In reality, I may have some belief in A, given EI may have some belief in B, given EI may have some belief not committed to either one

– The uncommitted belief (ignorance) should not be given to either A or B, even though I know one of the two must be true, but rather it should be given to “A or B”, denoted {A, B}

– Uncommitted belief may be given to A and B when new evidence is discovered

Page 49: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

49

• Representing ignorance–

– Ex: = {A,B,C}

• Belief function

)( hypotheses ofsubset a is nodeEach relations.et super/subs of lattice a as organized is 2set power The .hypotheses

EXH and ME ofset a },,...,{ :tdiscernmen of Frame 1

SS

hh n

{A,B,C} 0.15

{A,B} 0.1 {A,C} 0.1 {B,C}0.05

{A} 0.1 {B} 0.2 {C}0.3

{} 0

S1

;0)( ;1)(0

)( assignmenty probabilit basic a with associated is nodeEach

m(S)m

SmSm

S

3.0})({)},({4.002.01.01.0

)(})({})({}),({}),({1)( ;0)( ;)'()(

'

CBelBABel

mBmAmBAmBABelBelBelSmSBel

C

SS

Page 50: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

50

– Plausibility (upper bound of belief of a node)

– Methods are developed to combine the effect of multiple evidences (belief update by new evidence)

{A,B,C} 0.15

{A,B} 0.1 {A,C} 0.1 {B,C}0.05

{A} 0.1 {B} 0.2 {C}0.3

{} 0

interval belief )]( ),([ 7.03.01})({1}),({

)(1)( tocommited bemay tocommittednot belief All

SPlsSBelCBelBAPls

SBelSPlsSS

C

C

Lower bound (known belief)

Upper bound (maximally possible)

Page 51: 1 Abduction, Uncertainty, and Probabilistic Reasoning Chapters 13, 14, and more

51

• Advantage:– The only formal theory about ignorance– Disciplined way to handle evidence combination

• Disadvantages– Computationally very expensive (lattice size 2^– Assuming hypotheses are ME and EXH– How to obtain m(.) for each piece of evidence is not clear,

except subjectively