haystax bayesian networks

24
COMPANY PROPRIETARY INFORMATION Haystax Advanced Analytics Lab Automated Construction of Bayesian Networks from Qualitative Knowledge Dr. Ed Wright Dr. Bob Schrag Haystax Advanced Threat Analytics

Upload: haystax-technology

Post on 20-Jun-2015

180 views

Category:

Data & Analytics


3 download

DESCRIPTION

Automated Construction of Bayesian Networks from Qualitative Knowledge

TRANSCRIPT

Page 1: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Automated Construction of Bayesian Networks from Qualitative Knowledge

Dr. Ed Wright

Dr. Bob Schrag

Haystax Advanced Threat Analytics

Page 2: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Outline• Intro

• Fusion, situation assessment• Bayesian Networks• Challenges – where do all the numbers come from?

• Qualitative representation of Common Patterns• Concepts, Indicators, Summary, Mitigation and Relevance

• Qualitative representation: default values => parameters for the CPTs• Some implementation issues• Examples

• BN - Chest Xray• Complex Model

• Implementation• Future enhancements• Conclusion

Page 3: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Information Fusion - Situation Assessment

• Making inferences from heterogeneous, incomplete, contradictory information• Intelligence data fusion, Political forecasts, Medical diagnosis, Marketing, …

• Characteristics• Hypotheses of Interest – that can not be directly observed• Indicating hypotheses – that are likely true (or false) if a Hypothesis of Interest is true• Evidence / information related to one or more of the hypotheses• Incomplete knowledge and Uncertainty

Page 4: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Information Fusion - Example

Hypotheses of Interest Will the Blue insurgency succeed in country Orange?

Indicating hypotheses Blue is popularBlue has the military capacity to succeed

Blue has adequate military communicationsBlue has adequate weaponsBlue has operational successBlue has operational failure

Blue leadership is adequateOrange is Popular

Evidence Intel reports, radio intercepts, news reports, blogs,

twitter, …Incomplete knowledge and Uncertainty

Blue Insurgency Succeeds

Blue is Popular

Blue has Military

Capability

Blue has Adequate

Leadership

Orange is Vulnerable

Blue has military Comms

Blue has Adequate Weapons

Blue has Operational

Success

Blue has Operational

Failure

radio intercept

News Report

News Report

Intel. Report

Twitter Sentiment

News Report

News Report

Page 5: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Bayesian Networks for Information Fusion• Probabilistic Model

• Executable model• Uncertainty representation is built in

• Explicit / Efficient representation• Makes assumptions explicit• Facilitates communication between analysts• Support for learning, and for encoding prior knowledge• Inference propagates in all directions• Computational model

• ‘What if’ analysis• Sensitivity analysis

Hypothesis

Indicator1

true false0.1 0.9

true falsetrue 0.8 0.2

false 0.1 0.9

Indicator2

Hypothesis

Indicator 1 true false

T T 0.96 0.04T F 0.8 0.2F T 0.8 0.2F F 0.1 0.9

Page 6: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Challenges – Where do all of the numbers come from?• Each node requires a local probability table

• No parents - > Prior distribution• Node with Parents – Conditional Probability Table (CPT)

• One row in the table for each combination of the states of the parent

• Where do the numbers come from?• Learning• Expert knowledge from Subject Matter Experts• Combination Knowledge + Learning

• DARPA Program Manager: ‘It is a humongous knowledge engineering challenge!’

Parent1 t/f Parent2 t/f Parent3 t/f Parent4 t/f

Child t/f

Parent 1

Parent 2

Parent 3

Parent 4 true false

T T T TT T T FT T F TT T F FT F T TT F T FT F F TT F F FF T T TF T T FF T F TF T F FF F T TF F T FF F F TF F F F

Page 7: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Patterns• Concepts – Hypotheses

• Indicator • Evidence for or against one or more hypotheses• Also a hypothesis• We need a CPT function -> NoisyOR

• Mitigation, Relevance • Also a hypothesis

• Summary• Also a Hypothesis

Hypotheses

Indicator

Hypotheses

Indicator

Mitigation

Indicator

Summary

IndicatorIndicator

Page 8: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Qualitative assessment: Default Parameter Values• Strength

• Setting an evidence node true => same as applying likelihood to the hypothesis

• Strength of evidence:

• Use ratios • Strong = 8:1• Medium = 4:1• Weak = 2:1• Absolute =

• Polarity. If negative polarity, swap the parent state for the CPT calculation

Hypothesis

Indicator

Hypothesis

truefalse

12.088.0

Indicator

truefalse

18.481.6

Hypothesis

truefalse

100 0

Absolute Indicator

truefalse

100 0

Hypothesis

truefalse

0 100

Absolute Neg Indicator

truefalse

100 0

Hypothesis

truefalse

52.247.8

Strong Indicator

truefalse

100 0

Hypothesis

truefalse

1.6898.3

Strong Neg Indicator

truefalse

100 0

Hypothesis

truefalse

35.364.7

Medium Indicator

truefalse

100 0

Hypothesis

truefalse

3.3096.7

Medium Neg Indicator

truefalse

100 0

Hypothesis

truefalse

21.478.6

Weak Indicator

truefalse

100 0

Hypothesis

truefalse

6.3893.6

Weak Neg Indicator

truefalse

100 0

Page 9: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

CPT Generation - Indicators for Multiple Hypotheses

Need a function: F(a set of states of the parent nodes ) => one row of the CPTUsing Strength and polarity

strong, positivemedium, positive

weak, positive strong, negative

Hypothesis1

truefalse

32.867.2

Hypothesis2

truefalse

9.3290.7

Hypothesis3

truefalse

11.388.7

Hypothesis4

truefalse

59.140.9

ind2no

truefalse

100 0

Hypothesis1

truefalse

12.088.0

Hypothesis2

truefalse

5.0095.0

Hypothesis3

truefalse

8.0092.0

Hypothesis4

truefalse

85.015.0

ind2no

truefalse

31.069.0

Page 10: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

CPT Generation - NoisyOR (Other Functions are possible, i.e. NoisyAnd)

• NoisyOR for CPT• Need strength(s) and Leak

• Default leak = 0.1 • Can be overridden

• Strength• Strength of evidence - Use ratios

• Strong = 8:1• Medium = 4:1• Weak = 2:1• Absolute = inf

• Polarity. If negative polarity, swap the parent state for the CPT calculation

• Absolute strength: replace the row where the absolute parent is false with [0, 1.0]

E is the child nodeC is the set of parent nodesTrue ( C ) is the set of parents whose

state is true for the CPT element being calculated

pi is the causal strengthp0 is the leak

Noisy-Or, -And, -Max and -Sum Nodes in Netica2008-02-08© 2000-2008 Norsys Software Corp.

Page 11: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

CPT Generation: Relevance and Mitigation

Mitigation

truefalse

12.088.0

Indicator with Mitigation

truefalse

22.277.8

Hypothesis

truefalse

12.088.0

Hypothesis

Indicator with Mitigation

Mitigation

Evidence for or against Mitigation

Evidence for Mitigation

truefalse

18.481.6

Mitigation

truefalse

0 100

Indicator with Mitigation

truefalse

100 0

Hypothesis

truefalse

52.247.8

Evidence for Mitigation

truefalse

10.090.0

Mitigation

truefalse

100 0

Indicator with Mitigation

truefalse

100 0

Hypothesis

truefalse

12.088.0

Evidence for Mitigation

truefalse

80.020.0

Indicator with Mitigation

truefalse

100 0

Hypothesis

truefalse

22.177.9

Mitigation

truefalse

74.825.2

Hypothesis

Relevance

Indicator with Relevance

Evidence for or against Relevance

Evidence for Mitigation

truefalse

100 0

Page 12: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

CPT Generation: Relevance and Mitigation• Desired(?): applying evidence

does not change the mitigation

Indicator with Mitigation

truefalse

1.6298.4

Evidence in Mitigation

truefalse

0 100

Mitigation

truefalse

2.9497.1

Hypothesis

truefalse

0.5099.5

Indicator with Mitigation

truefalse

100 0

Evidence in Mitigation

truefalse

0 100

Mitigation

truefalse

91.09.01

Hypothesis

truefalse

9.4690.5

Indicator with Mitigation

truefalse

100 0

Evidence in Mitigation

truefalse

0 100

Mitigation

truefalse

2.9497.1

Hypothesis

truefalse

97.12.93

Need to do the algebra, figure out what numbers will result in no change to the mitigation when evidence is applied.Note: when evidence is applied elsewhere in the network, mitigation will change.

Page 13: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Patterns - Extras

• Copy of / opposite of

• Rare event• Over ride the default Leak

• Target beliefs

Hypothesis

OppositeSynonym

Deterministic Relationship

Hypotheses

Indicator

Hypotheses

Indicator

Mitigation

Indicator

Summary

IndicatorIndicatorPrior P(t) is too small or too large

TgtBelCnstrnt

Calculate CPT for artificial evidence, when applied, brings P(t) to desired value

Page 14: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Patterns - Extras

If a node is a parent to a summary node, the effect of mitigation or relevance on that node gets ignored

• Hypothesis Copy

The construction software automatically detects this, and introduces a Hypothesis copy as the parent of the original hypothesis, and of the summary

HypothesesHypotheses

Mitigation

Indicator Summary

Hypotheses

Hypotheses

Mitigation

IndicatorSummary

Hyp Copy

Indicator

Indicator

Deterministic Relationship

Page 15: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

VisualizationPositive InfluenceNegative Influence

Absolute InfluenceStrong Influence Moderate InfluenceWeak Influence

Absolute, Positive

Absolute, Negative

Weak, Negative

Strong, Positive

Moderate, Positive

Strong, Positive

Strong, Positive

Absolute,

Negative

Moderate, Positive

Page 16: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Example - Chest X-Ray Qualitative statementsTuberculosis is a medium positive indicator of VisitToAsia

LungCancer is a strong positive indicator of Smoking

Bronchitis is a weak positive indicator of smoking

XRay is a strong positive indicator of TuberculosisOrCancer

Dyspnea is a strong positive indicator of Bronchitis

Dyspnea is a strong positive indicator of TuberculosisOrCancer

TuberculosisOrCancer is an OR summary of [Tuberculosis, LungCancer]

VisitToAsia prior: 0.01

Smoking prior: 0.5

Tuberculosis targetBelief: 0.01

LungCancer leak: 0.01

Bronchitis leak: 0.30

XRay leak 0.05

Tuberculosis

presentabsent

1.0499.0

Tuberculosis or Cancer

truefalse

6.4893.5

XRay

abnormalnormal

11.089.0

Lung Cancer

presentabsent

5.5094.5

Dyspnea

presentabsent

43.656.4

Bronchitis

presentabsent

45.055.0

Smoking

smokernonsmoker

50.050.0

Visit to Asia

Visited Asia within the last 3 y...no visit

1.0099.0

Original model: Based on Lauritzen & Spiegelhalter 1988. Distributed by Norsys Software Corp.

Page 17: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Example - Chest Xray

Tuberculosis

presentabsent

1.0499.0

Tuberculosis or Cancer

truefalse

6.4893.5

XRay

abnormalnormal

11.089.0

Lung Cancer

presentabsent

5.5094.5

Dyspnea

presentabsent

43.656.4

Bronchitis

presentabsent

45.055.0

Smoking

smokernonsmoker

50.050.0

Visit to Asia

Visited Asia within the last 3 y...no visit

1.0099.0

Original model: Based on Lauritzen & Spiegelhalter 1988. Distributed by Norsys Software Corp.

VisitToAsia

truefalse

0.6499.4

Smoking

truefalse

50.050.0

XRay

truefalse

7.2592.8

Dyspnea

truefalse

48.851.2

TuberculosisOrCancer

truefalse

5.9194.1

TgtBelCnstForTuberculosis

truefalse

100 0

Tuberculosis

truefalse

1.099.0

LungCancer

truefalse

4.9695.0

Bronchitis

truefalse

51.049.0

Automatically constructed from qualitative representation.

Page 18: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Example – Complex Model

• Bayesian Network representing concepts and relationships defined in a set of source documents

• 700(+) concepts

• The Government Customer: “for the first time, we have a computational model of the concepts in this document!”

Page 19: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

1. Extract key concept nodes.

2. Extract influence arcs, specifying…

3. Create master influence graph.

4. Compile into standard Bayesian network (BN).

5. Exploit, improve the model.

• Strength: Absolute, strong, moderate, weak, …• Polarity: Positive, negative

• Assemble graph from extracted arcs.• Review extracted concepts, influences.• Normalize extracted concept names, definitions.• Insert missing concepts, influences.• Add missing logical structure: AND, OR, Opposite.

• Collect concepts’ parent nodes, build conditional probability tables (CPTs) per influence spec’s.

• Insert pattern-required nodes for mitigation, relevance.

• Run the model against test cases.• Review model inferences with SMEs.• Revise as appropriate.

Model Development Process

Page 20: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Master Influence Graph(defparameter *Influences* '((VisitToAsia (:IndicatedBy

(:Moderately (Tuberculosis)))) (Smoking (:IndicatedBy

(:Strongly (LungCancer))(:Weakly (Bronchitis

(:IndicatedBy (:Strongly (Dyspnea)))))))

(TuberculosisOrCancer (:ImpliedByDisjunction

(Tuberculosis)(LungCancer))

(:IndicatedBy(:Strongly (XRay) (Dyspnea))))))

Tuberculosis

presentabsent

1.0499.0

Tuberculosis or Cancer

truefalse

6.4893.5

XRay

abnormalnormal

11.089.0

Lung Cancer

presentabsent

5.5094.5

Dyspnea

presentabsent

43.656.4

Bronchitis

presentabsent

45.055.0

Smoking

smokernonsmoker

50.050.0

Visit to Asia

Visited Asia within the last 3 y...no visit

1.0099.0

Page 21: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Implementation• Netica, NeticaJ API

• Jython• Integrate with Java/NeticaJ• Higher level tools on top of NeticaJ

• GraphViz

• Netica does not do graph layout• Build the BN in Netica (all the nodes are on top of each other)• Extract nodes and links• Use GraphViz to layout• Update each Node in Netica with new coordinates

• Franz Lisp / AllegroGraph – Netica API• Master Influence Graph• Ingesting data and applying evidence to the network

Page 22: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Limitations & Future Enhancements

• Limitations• Binary Nodes• Limited (but extendable) set of patterns

• Future Enhancements• Strength of mitigation / relevance• Richer set of qualitative statements• Additional CPT models (NoisyAnd, …)• Global / local mitigation• Visual editor

Page 23: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Conclusion

For a large set of information fusion problems, it is possible to shortcut the ‘humongous knowledge engineering challenge’, by automatically generating a usable Bayesian Network from English like qualitative statements (and a few numbers).

The resulting Bayesian Network is immediately useful, and can be a start point for further knowledge refinement.

Page 24: Haystax bayesian networks

COMPANY PROPRIETARY INFORMATION

Haystax Advanced Analytics Lab

Thank You

Contact us: [email protected]

Visit us: www.haystax.com

8251 Greensboro Drive, Suite 1111

McLean, VA 22012