expert systems 8 1 bayesian networks 1.probability theory 2.bn as knowledge model 3.bayes in court...

34
Expert Systems 8 1 Bayesian Networks 1. Probability theory 2. BN as knowledge model 3. Bayes in Court 4. Dazzle examples 5. Conclusions Reverend Thomas Bayes (1702-1761) Jenneke IJzerman, Bayesiaanse Statistiek in de Rechtspraak, VU Amsterdam, September 2004. http://www.few.vu.nl/onderwijs/stage/werkstuk/werkstukken/ werkstuk-ijzerman.doc

Post on 21-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 1

Bayesian Networks

1. Probability theory2. BN as knowledge model3. Bayes in Court4. Dazzle examples5. Conclusions

Reverend Thomas Bayes(1702-1761)

Jenneke IJzerman,Bayesiaanse Statistiek in de Rechtspraak,VU Amsterdam, September 2004.http://www.few.vu.nl/onderwijs/stage/werkstuk/werkstukken/werkstuk-ijzerman.doc

Page 2: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 2

Thought Experiment: Hypothesis Selection

Imagine two types of bag:• BagA: 250 + 750• BagB: 750 + 250

Take 5 balls from a bag:• Result: 4 + 1What is the type of the bag?

Probability of this result from• BagA: 0. 0144• BagB: 0. 396Conclusion: The bag is BagB.

But…• We don’t know how the bag

was selected• We don’t even know that type

BagB exists

• Experiment is meaningful only in light of the a priori posed hypotheses (BagA, BagB) and their assumed likelihoods.

Page 3: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 3

Classical and Bayesian statistics

Classical statistics:• Compute the prob for your

data, assuming a hypothesis• Reject a hypothesis if the

data becomes unlikely

Bayesian statistics:• Compute the prob for a

hypothesis, given your data• Requires a priori prob for

each hypothesis;these are extremely important!

Page 4: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 4

Part I: Probability theory

What is a probability?• Frequentist: relative

frequency of occurrence.• Subjectivist: amount of belief

• Mathematician:Axioms (Kolmogorov),assignment of non-negative numbers to a set of states, sum 1 (100%).

State has several variables: product space.

With n binary variables: 2n.

Multi-valued variables.

Blont Not blond

30 70

Blond Not blond

Mother blond

15 15

Mother n.b.

15 55

Page 5: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 5

Conditional Probability: Using evidence

• First table:Probability for any woman to deliver blond baby

• Second table:Describes for blond and non-blond mothers separately

• Third table:Describe only for blond mother

Row is rescaled with its weight;Def. conditional probability:

Pr(A|B) = Pr( A & B ) / Pr(B)

Rewrite:Pr(A & B) = Pr(B) x Pr(A | B)

Blond Not blond

30 70

Blond Not blond

Mother

blond

15 15

Mother n.b.

15 55

Blond Not blond

Mother

blond

50 50

Page 6: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 6

Dependence and Independence

• The prob for a blond child are 30%, but larger for a blond mother and smaller for a non-blond mother.

• The prob for a boy are 50%, also for blond mothers, and also for non-blond mothers.

Def.: A and B are independent: Pr(A|B) = Pr(A)

Exercise: Show thatPr(A|B) = Pr(A)

is equivalent toPr(B|A) = Pr(B)

(aka B and A are independent).

Blond Not blond

Mother blond

15 15

Mother n.b.

15 55

Boy Girl

Mother blond

15 15

Mother n.b.

35 35

Boy Girl

Mother blond

50 50

Page 7: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 7

Bayes Rule: from data to hypothesis

4 + 1 Other

BagA 0.0144 0.986

BagB 0.396 0.604

Other

• Classical Probability Theory:0.0144 is the relative weight of 4+1 in the ROW of BagA.

• Bayesian Theory describes the distribution over the column of 4+1.

Classical statistics: ROW distribution

Bayesian statistic: COLUMN distr.

Bayes’ Rule:• Observe that

Pr(A & B) = Pr(A) x Pr(B|A) = Pr(B) x Pr(A|B)

• Conclude Bayes’ Rule:

)()()|(

)|(BP

APABPBAP

Page 8: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 8

Reasons for Dependence 1: Causality

• Dependency: P(B|A) ≠ P(B)• Positive Correlation: >• Negative correlation: <

Possible explanation:A causes B.

Example: P(headache) = 6%P(ha | party) =

10%P(ha | ¬party) =

2%

h.a. no h.a.

party 5 45

no part 1 49

Alternative explanation:B causes A.

In the same example:P(party) = 50%P(party | h.a.) = 83%P(party | no h.a.) = 48%

“Headaches make students go to parties.”

In statistics, correlation has no direction.

Page 9: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 9

2. Table of headache and money:

Pr(broke) = 30%Pr(broke | h.a.) = 50%

3. Table of headache and money for party attendants:

Reasons for Dependence 2: Common cause

1. The student party may lead to headache and is costly(money versus broke):

mon-brh.a. no h.a.

party 52-3

4518-27

no part 11-0

4949-0

h.a. no h.a.

money 3 67

broke 3 27

h.a. no h.a.

money 2 18

broke 3 27

This dependency disappears if the common cause variable is known

Page 10: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 10

Reasons for Dependence 3: Common effect

A and B are independent:

Pr(B) = 80%Pr(B|A) = 80%B and A are independent.

Their combination stimulates C; for instances satisfying C:

Pr(B) = 90%Pr(B|A) = 93%, Pr(B|¬A)=80%

(#C) A non A

B 40 (14) 40 (4)

non B 10 (1) 10 (1)

A non A

B 14 4

non B 1 1

This dependency appears if the common effect variable is known

Page 11: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 11

Part II: Bayesian Networks

• Probabilistic Graphical Model• Probabilistic Network• Bayesian Network• Belief Network

Consists of:• Variables (n)• Domains (here binary)• Acyclic arc set, modeling the

statistical influences• Per variable V (indegree k):

Pr(V | E), for 2k cases of E.

Information in node:exponential in indegree.

Pr -

pa 50%pa

brha

Pr pa ¬pa

ha 10% 2%

Pr pa ¬pa

br 40% 0%

C

BA

Pr A,B A,¬B ¬A,B ¬A,¬B

C 56% 10% 10% 10%

Page 12: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 12

The Bayesian Network ModelClosed World Assumption• Rule based:

IF x attends party THEN x has headache WITH cf = .10What if x didn’t attend?

• Bayesian model:

Direction of arcs and correlation

pa

ha Pr pa ¬pa

ha 10% 2%

Pr -

pa 50%

pa

ha

Pr ha ¬ha

pa 83% 48%

Pr -

ha 6%

Pr(ha|¬pa) is included: claim all relevant info is modeled

1. BN does not necessarily model causality

2. Built upon HE understanding of relationships; often causal

Page 13: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 13

A little theorem

• A Bayesian network on n binary variables uniquely defines a probability distribution over the associated set of 2n states.

• Distribution has 2n parameters(numbers in [0..1] with sum 1).

• Typical network has in-degree 2 to 3:represented by 4n to 8n parameters (PIGLET!!).

• Bayesian Networks are an efficient representation

Page 14: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 14

The Utrecht DSS group

• Initiated by Prof Linda van der Gaag from ~1990• Focus: development of BN support tools• Use experience from building several actual BNs• Medical

applications• Oesoca,

~40 nodes.

• Courses:ProbabilisticReasoning

• NetworkAlgoritms(Ma ACS).

Page 15: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 15

How to obtain a BN model

Describe Human Expert knowledge:Metastatic Cancer may be detected by an increased level of serum calcium (SC). The Brain Tumor (BT) may be seen on a CT scan (CT). Severe headaches (SH) are indicative for the presence of a brain tumor. Both a Brain tumor and an increased level of serum calcium may bring the patient in a coma (Co).

Probabilities: Expert guess or statistical study

Learn BN structure automatically from data by means of Data Mining• Research of Carsten• Models not intuitive• Not considered XS• Helpful addition to

Knowledge Acquisition from Human Expert

• Master ACS.mc

sc

bt

co sh

ct

Page 16: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 16

Inference in Bayesian Networks

The probability of a stateS = (v1, .. , vn):Multiply Pr(vi | S)

The marginal (overall) probability of each variable:

Sampling: Produce a series of cases, distributed according to the probability distribution implicit in the BN

Pr -

pa 50%pa

brha

Pr pa ¬pa

ha 10% 2%

Pr pa ¬pa

br 40% 0%

Pr (pa, ¬ha, ¬br)= 0.50 * 0.90 * 0.60

= 0.27

Pr(pa) = 50%

Pr(ha) = 6%

Pr(br) = 20%

Page 17: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 17

Consultation: Entering Evidence

Consultation applies the BN knowledge to a specific case• Known variable values can be entered into the network• Probability tables for all nodes are updated

• Obtain (sthlike) new BNmodeling theconditionaldistribution

• Again, showdistributionsand stateprobabilities

• Backward andForwardpropagation

Page 18: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 18

Test Selection (Danielle)

• In consultation, enter data until goal variable is known with sufficient probability.

• Data items are obtained at specific cost.

• Data items influence the distribution of the goal.

Problem:• Given the current state of the

consultation, find out what is the best variable to test next.

Started CS study 1996,

PhD Thesis defense Oct 2005

Page 19: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 19

Some more work done in Linda’s DSS group

• Sensitivity Analysis:Numerical parameters in the BN may be inaccurate;how does this influence the consultation outcome?

• More efficient inferencing:Inferencing is costly, especially in the presence of

• Cycles (NB.: There are no directed cycles!)

• Nodes with a high in-degree

Approximate reasoning, network decompositions, …

• Writing a program tool: Dazzle

Page 20: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 20

Part III: In the Courtroom

What happens in a trial?• Prosecutor and Defense

collect information• Judge decides if there is

sufficient evidence that person is guilty

Forensic tests are far moreconclusive than medical onesbut still probabilistic innature!

Pr(symptom|sick) = 80%Pr(trace|innocent) = 0.01%

Tempting to forget statistics.Need a priori probabilities.

)()()|(

)|(BP

APABPBAP

Jenneke IJzerman, Bayesiaanse Statistiek in de Rechtspraak, VU Amsterdam, September 2004.

Page 21: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 21

Prosecutor’s Fallacy

The story:• A DNA sample was taken

from the crime site• Probability of a match of

samples of different people is 1 in 10,000

• 20,000 inhabitants are sampled

• John’s DNA matches the sample

• Prosecutor: chances that John is innocent is 1 in 10,000

• Judge convicts John

The analysis• The prosecutor confuses

Pr(inn | evid) (a) Pr(evid | inn) (b)

• Forensic experts can only shed light on (b)

• The Judge must find (a);a priori probabilities are needed!! (Bayes)

• Dangerous to convict on DNA samples alone

• Pr(innocent match) = 86%• Pr(1 such match) = 27%

Page 22: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 22

Defender’s Fallacy

The story• Town has 100,001 people• We expect 11 to match

(1 guilty plus 10 innocent)• Probability that John is guilty

is 9%.

• John must be released

Implicit assumptions:• Offender is from town.• Equal a priori probability for

each inhabitant

It is necessary to take other circumstances into account;why was John prosecuted andwhat other evidence exists?

Conclusions:• PF: it is necessary to take

Bayes and a priori probs into account

• DF: estimating the a prioris is crucial for the outcome

Page 23: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 23

verslagen van deskundigen behelzende hun gevoelen betreffende hetgeen hunne wetenschap hen leert omtrent datgene wat aan hun oordeel onderworpen is

Experts’ and Judge’s task

IJzerman’s ideas about trial:1. Forensic Expert may not

claim a priori or a posteriori probabilities (Dutch Penalty Code, 344-1.4)

2. Judge must set a priori3. Judge must compute a

posteriori, based on statements of experts

4. Judge must have explicit threshold of probability for beyond reasonable doubt

5. Threshold should be explicitized in law.

Is this realistic?1. Avoid confusing Pr(G|E) and

Pr(E|G), a good idea2. A priori’s are extremely

important; this almost pre-determines the verdict

3. How is this done? Bayesian Network designed and controlled by Judge?

4. No judge will obey a mathematical formula

5. Public agreement and acceptance?

Page 24: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 24

Bayesian Alcoholism Test

• Driving under influence of alcohol leads to a penalty• Administrative procedure may voiden licence

• Judge must decide if the subject is an alcohol addict;incidental or regular (harmful) drinking

• Psychiatrists advice the court by determining if drinking was incidental or regular

• Goal HHAU: Harmful and Hazardous Alcohol Use• Probabilistically confirmed or denied by clinical tests• Bayesian Alcoholism Test: developed 1999-2004 by A.

Korzec, Amsterdam.

Page 25: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 25

Variables in Bayesian Alcoholism Test

Hidden variables:• HHAU: alcoholisme• Liver disease

Observable causes:• Hepatitis risk• Social factors• BMI, diabetes

Observable effects:• Skin color• Lab: blood, breadth• Level of Response• Smoking• CAGE questionnaire

Page 26: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 26

Knowledge Elicitation for BAT

Knowledge in the Network• Qualitative

- What variables are relevant- How do they interrelate

• Quantitative- A priori probabilities- Conditional probabilities

for hidden diseases- Conditional probabilities

for effects- Response of lab tests to

hidden diseases

How it was obtained• Network structure??

IJzerman does not report about this

• Probabilities- Literature studies:

40% of probabilities- Expert opinions:

60% of probabilities

Page 27: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 27

Consultation with BAT

Enter evidence about subject:• Clinical signs:

skin, smoking, LRA;CAGE.

• Lab results• Social factors

The network will return:• Probability that Subject has

HHAU• Probabilities for liver disease

and diabetes

The responsible Human MedicalExpert converts this probabilityto a YES/NO for the judge!(Interpretation phase)

HME may take other data intoaccount (rare disease).

Knowing what the CAGE is used for may influence the answers that the subject gives.

Page 28: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 28

Part IV: Bayes in the Field

The Dazzle program• Tool for designing and analysing BN• Mouse-click the network;

fill in the probabilities• Consult by evidence submission• Read posterior probabilities

• Development 2004-2006• Written in Haskell• Arjen van IJzendoorn, Martijn Schrage

• www.cs.uu.nl/dazzle

Page 29: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 29

Importance of a good model

In 1998, Donna Anthony (31) was convicted for murdering her two children. She was in prison for seven years but claimed her children died of cot death.

Prosecutor:The probability of two cot deaths in one family is too small, unless the mother is guilty.

Page 30: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 30

The Evidence against Donna Anthony

• BN with priors eliminates Prosecutor’s Fallacy

• Enter the evidence:both children died

• A priori probability is very small (1 in 1,000,000)

• Dazzle establishes a 97.6% probability of guilt

• Name of expert: Prof. Sir Roy Meadow (1933)

• His testimony brought a dozen mothers in prison in a decade

Page 31: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 31

A More Refined Model

Allow for genetic or social circumstances for which parent is not liable.

Page 32: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 32

The Evidence against Donna?

Refined model: genetic defect is the most likely cause of repeated deaths

Donna Anthony was released in 2005 after 7 years in prison

6/2005: Struck from GMC register7/2005: Appeal by Meadow2/2006: Granted; otherwise experts

refuse witnessing

Page 33: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 33

Classical Swine Fever, Petra Geenen

• Swine Fever is a costly disease

• Development 2004/5

• 42 vars, 80 arcs• 2454 Prs, but

many are 0.

• Pig/herd level• Prior extremely

small• Probability

elicitation with questionnaire

Page 34: Expert Systems 8 1 Bayesian Networks 1.Probability theory 2.BN as knowledge model 3.Bayes in Court 4.Dazzle examples 5.Conclusions Reverend Thomas Bayes

Expert Systems 8 34

Conclusions

• Mathematically sound model to reason with uncertainty• Further studied in Probabilistic Reasoning (ACS)• Applicable to areas where knowledge is highly statistical

• Acquisition: Instead of classical IF a THEN b (WITH c),obtain both Pr(b|a) and Pr(b|¬a)

• More work but more powerful model• One formalism allows both diagnostic and prognostic

reasoning

• Danger: apparent exactness is deceiving• Disadvantage: Lack of explanation facilities (research);

Model is quite transparant, but consultations are not.

• Increasing popularity, despite difficulty in building