haystax bayesian networks
DESCRIPTION
Automated Construction of Bayesian Networks from Qualitative KnowledgeTRANSCRIPT
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Automated Construction of Bayesian Networks from Qualitative Knowledge
Dr. Ed Wright
Dr. Bob Schrag
Haystax Advanced Threat Analytics
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Outline• Intro
• Fusion, situation assessment• Bayesian Networks• Challenges – where do all the numbers come from?
• Qualitative representation of Common Patterns• Concepts, Indicators, Summary, Mitigation and Relevance
• Qualitative representation: default values => parameters for the CPTs• Some implementation issues• Examples
• BN - Chest Xray• Complex Model
• Implementation• Future enhancements• Conclusion
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Information Fusion - Situation Assessment
• Making inferences from heterogeneous, incomplete, contradictory information• Intelligence data fusion, Political forecasts, Medical diagnosis, Marketing, …
• Characteristics• Hypotheses of Interest – that can not be directly observed• Indicating hypotheses – that are likely true (or false) if a Hypothesis of Interest is true• Evidence / information related to one or more of the hypotheses• Incomplete knowledge and Uncertainty
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Information Fusion - Example
Hypotheses of Interest Will the Blue insurgency succeed in country Orange?
Indicating hypotheses Blue is popularBlue has the military capacity to succeed
Blue has adequate military communicationsBlue has adequate weaponsBlue has operational successBlue has operational failure
Blue leadership is adequateOrange is Popular
Evidence Intel reports, radio intercepts, news reports, blogs,
twitter, …Incomplete knowledge and Uncertainty
Blue Insurgency Succeeds
Blue is Popular
Blue has Military
Capability
Blue has Adequate
Leadership
Orange is Vulnerable
Blue has military Comms
Blue has Adequate Weapons
Blue has Operational
Success
Blue has Operational
Failure
radio intercept
News Report
News Report
Intel. Report
Twitter Sentiment
News Report
News Report
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Bayesian Networks for Information Fusion• Probabilistic Model
• Executable model• Uncertainty representation is built in
• Explicit / Efficient representation• Makes assumptions explicit• Facilitates communication between analysts• Support for learning, and for encoding prior knowledge• Inference propagates in all directions• Computational model
• ‘What if’ analysis• Sensitivity analysis
Hypothesis
Indicator1
true false0.1 0.9
true falsetrue 0.8 0.2
false 0.1 0.9
Indicator2
Hypothesis
Indicator 1 true false
T T 0.96 0.04T F 0.8 0.2F T 0.8 0.2F F 0.1 0.9
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Challenges – Where do all of the numbers come from?• Each node requires a local probability table
• No parents - > Prior distribution• Node with Parents – Conditional Probability Table (CPT)
• One row in the table for each combination of the states of the parent
• Where do the numbers come from?• Learning• Expert knowledge from Subject Matter Experts• Combination Knowledge + Learning
• DARPA Program Manager: ‘It is a humongous knowledge engineering challenge!’
Parent1 t/f Parent2 t/f Parent3 t/f Parent4 t/f
Child t/f
Parent 1
Parent 2
Parent 3
Parent 4 true false
T T T TT T T FT T F TT T F FT F T TT F T FT F F TT F F FF T T TF T T FF T F TF T F FF F T TF F T FF F F TF F F F
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Patterns• Concepts – Hypotheses
• Indicator • Evidence for or against one or more hypotheses• Also a hypothesis• We need a CPT function -> NoisyOR
• Mitigation, Relevance • Also a hypothesis
• Summary• Also a Hypothesis
Hypotheses
Indicator
Hypotheses
Indicator
Mitigation
Indicator
Summary
IndicatorIndicator
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Qualitative assessment: Default Parameter Values• Strength
• Setting an evidence node true => same as applying likelihood to the hypothesis
• Strength of evidence:
• Use ratios • Strong = 8:1• Medium = 4:1• Weak = 2:1• Absolute =
• Polarity. If negative polarity, swap the parent state for the CPT calculation
Hypothesis
Indicator
Hypothesis
truefalse
12.088.0
Indicator
truefalse
18.481.6
Hypothesis
truefalse
100 0
Absolute Indicator
truefalse
100 0
Hypothesis
truefalse
0 100
Absolute Neg Indicator
truefalse
100 0
Hypothesis
truefalse
52.247.8
Strong Indicator
truefalse
100 0
Hypothesis
truefalse
1.6898.3
Strong Neg Indicator
truefalse
100 0
Hypothesis
truefalse
35.364.7
Medium Indicator
truefalse
100 0
Hypothesis
truefalse
3.3096.7
Medium Neg Indicator
truefalse
100 0
Hypothesis
truefalse
21.478.6
Weak Indicator
truefalse
100 0
Hypothesis
truefalse
6.3893.6
Weak Neg Indicator
truefalse
100 0
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
CPT Generation - Indicators for Multiple Hypotheses
Need a function: F(a set of states of the parent nodes ) => one row of the CPTUsing Strength and polarity
strong, positivemedium, positive
weak, positive strong, negative
Hypothesis1
truefalse
32.867.2
Hypothesis2
truefalse
9.3290.7
Hypothesis3
truefalse
11.388.7
Hypothesis4
truefalse
59.140.9
ind2no
truefalse
100 0
Hypothesis1
truefalse
12.088.0
Hypothesis2
truefalse
5.0095.0
Hypothesis3
truefalse
8.0092.0
Hypothesis4
truefalse
85.015.0
ind2no
truefalse
31.069.0
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
CPT Generation - NoisyOR (Other Functions are possible, i.e. NoisyAnd)
• NoisyOR for CPT• Need strength(s) and Leak
• Default leak = 0.1 • Can be overridden
• Strength• Strength of evidence - Use ratios
• Strong = 8:1• Medium = 4:1• Weak = 2:1• Absolute = inf
• Polarity. If negative polarity, swap the parent state for the CPT calculation
• Absolute strength: replace the row where the absolute parent is false with [0, 1.0]
E is the child nodeC is the set of parent nodesTrue ( C ) is the set of parents whose
state is true for the CPT element being calculated
pi is the causal strengthp0 is the leak
Noisy-Or, -And, -Max and -Sum Nodes in Netica2008-02-08© 2000-2008 Norsys Software Corp.
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
CPT Generation: Relevance and Mitigation
Mitigation
truefalse
12.088.0
Indicator with Mitigation
truefalse
22.277.8
Hypothesis
truefalse
12.088.0
Hypothesis
Indicator with Mitigation
Mitigation
Evidence for or against Mitigation
Evidence for Mitigation
truefalse
18.481.6
Mitigation
truefalse
0 100
Indicator with Mitigation
truefalse
100 0
Hypothesis
truefalse
52.247.8
Evidence for Mitigation
truefalse
10.090.0
Mitigation
truefalse
100 0
Indicator with Mitigation
truefalse
100 0
Hypothesis
truefalse
12.088.0
Evidence for Mitigation
truefalse
80.020.0
Indicator with Mitigation
truefalse
100 0
Hypothesis
truefalse
22.177.9
Mitigation
truefalse
74.825.2
Hypothesis
Relevance
Indicator with Relevance
Evidence for or against Relevance
Evidence for Mitigation
truefalse
100 0
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
CPT Generation: Relevance and Mitigation• Desired(?): applying evidence
does not change the mitigation
Indicator with Mitigation
truefalse
1.6298.4
Evidence in Mitigation
truefalse
0 100
Mitigation
truefalse
2.9497.1
Hypothesis
truefalse
0.5099.5
Indicator with Mitigation
truefalse
100 0
Evidence in Mitigation
truefalse
0 100
Mitigation
truefalse
91.09.01
Hypothesis
truefalse
9.4690.5
Indicator with Mitigation
truefalse
100 0
Evidence in Mitigation
truefalse
0 100
Mitigation
truefalse
2.9497.1
Hypothesis
truefalse
97.12.93
Need to do the algebra, figure out what numbers will result in no change to the mitigation when evidence is applied.Note: when evidence is applied elsewhere in the network, mitigation will change.
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Patterns - Extras
• Copy of / opposite of
• Rare event• Over ride the default Leak
• Target beliefs
Hypothesis
OppositeSynonym
Deterministic Relationship
Hypotheses
Indicator
Hypotheses
Indicator
Mitigation
Indicator
Summary
IndicatorIndicatorPrior P(t) is too small or too large
TgtBelCnstrnt
Calculate CPT for artificial evidence, when applied, brings P(t) to desired value
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Patterns - Extras
If a node is a parent to a summary node, the effect of mitigation or relevance on that node gets ignored
• Hypothesis Copy
The construction software automatically detects this, and introduces a Hypothesis copy as the parent of the original hypothesis, and of the summary
HypothesesHypotheses
Mitigation
Indicator Summary
Hypotheses
Hypotheses
Mitigation
IndicatorSummary
Hyp Copy
Indicator
Indicator
Deterministic Relationship
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
VisualizationPositive InfluenceNegative Influence
Absolute InfluenceStrong Influence Moderate InfluenceWeak Influence
Absolute, Positive
Absolute, Negative
Weak, Negative
Strong, Positive
Moderate, Positive
Strong, Positive
Strong, Positive
Absolute,
Negative
Moderate, Positive
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Example - Chest X-Ray Qualitative statementsTuberculosis is a medium positive indicator of VisitToAsia
LungCancer is a strong positive indicator of Smoking
Bronchitis is a weak positive indicator of smoking
XRay is a strong positive indicator of TuberculosisOrCancer
Dyspnea is a strong positive indicator of Bronchitis
Dyspnea is a strong positive indicator of TuberculosisOrCancer
TuberculosisOrCancer is an OR summary of [Tuberculosis, LungCancer]
VisitToAsia prior: 0.01
Smoking prior: 0.5
Tuberculosis targetBelief: 0.01
LungCancer leak: 0.01
Bronchitis leak: 0.30
XRay leak 0.05
Tuberculosis
presentabsent
1.0499.0
Tuberculosis or Cancer
truefalse
6.4893.5
XRay
abnormalnormal
11.089.0
Lung Cancer
presentabsent
5.5094.5
Dyspnea
presentabsent
43.656.4
Bronchitis
presentabsent
45.055.0
Smoking
smokernonsmoker
50.050.0
Visit to Asia
Visited Asia within the last 3 y...no visit
1.0099.0
Original model: Based on Lauritzen & Spiegelhalter 1988. Distributed by Norsys Software Corp.
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Example - Chest Xray
Tuberculosis
presentabsent
1.0499.0
Tuberculosis or Cancer
truefalse
6.4893.5
XRay
abnormalnormal
11.089.0
Lung Cancer
presentabsent
5.5094.5
Dyspnea
presentabsent
43.656.4
Bronchitis
presentabsent
45.055.0
Smoking
smokernonsmoker
50.050.0
Visit to Asia
Visited Asia within the last 3 y...no visit
1.0099.0
Original model: Based on Lauritzen & Spiegelhalter 1988. Distributed by Norsys Software Corp.
VisitToAsia
truefalse
0.6499.4
Smoking
truefalse
50.050.0
XRay
truefalse
7.2592.8
Dyspnea
truefalse
48.851.2
TuberculosisOrCancer
truefalse
5.9194.1
TgtBelCnstForTuberculosis
truefalse
100 0
Tuberculosis
truefalse
1.099.0
LungCancer
truefalse
4.9695.0
Bronchitis
truefalse
51.049.0
Automatically constructed from qualitative representation.
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Example – Complex Model
• Bayesian Network representing concepts and relationships defined in a set of source documents
• 700(+) concepts
• The Government Customer: “for the first time, we have a computational model of the concepts in this document!”
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
1. Extract key concept nodes.
2. Extract influence arcs, specifying…
3. Create master influence graph.
4. Compile into standard Bayesian network (BN).
5. Exploit, improve the model.
• Strength: Absolute, strong, moderate, weak, …• Polarity: Positive, negative
• Assemble graph from extracted arcs.• Review extracted concepts, influences.• Normalize extracted concept names, definitions.• Insert missing concepts, influences.• Add missing logical structure: AND, OR, Opposite.
• Collect concepts’ parent nodes, build conditional probability tables (CPTs) per influence spec’s.
• Insert pattern-required nodes for mitigation, relevance.
• Run the model against test cases.• Review model inferences with SMEs.• Revise as appropriate.
Model Development Process
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Master Influence Graph(defparameter *Influences* '((VisitToAsia (:IndicatedBy
(:Moderately (Tuberculosis)))) (Smoking (:IndicatedBy
(:Strongly (LungCancer))(:Weakly (Bronchitis
(:IndicatedBy (:Strongly (Dyspnea)))))))
(TuberculosisOrCancer (:ImpliedByDisjunction
(Tuberculosis)(LungCancer))
(:IndicatedBy(:Strongly (XRay) (Dyspnea))))))
Tuberculosis
presentabsent
1.0499.0
Tuberculosis or Cancer
truefalse
6.4893.5
XRay
abnormalnormal
11.089.0
Lung Cancer
presentabsent
5.5094.5
Dyspnea
presentabsent
43.656.4
Bronchitis
presentabsent
45.055.0
Smoking
smokernonsmoker
50.050.0
Visit to Asia
Visited Asia within the last 3 y...no visit
1.0099.0
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Implementation• Netica, NeticaJ API
• Jython• Integrate with Java/NeticaJ• Higher level tools on top of NeticaJ
• GraphViz
• Netica does not do graph layout• Build the BN in Netica (all the nodes are on top of each other)• Extract nodes and links• Use GraphViz to layout• Update each Node in Netica with new coordinates
• Franz Lisp / AllegroGraph – Netica API• Master Influence Graph• Ingesting data and applying evidence to the network
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Limitations & Future Enhancements
• Limitations• Binary Nodes• Limited (but extendable) set of patterns
• Future Enhancements• Strength of mitigation / relevance• Richer set of qualitative statements• Additional CPT models (NoisyAnd, …)• Global / local mitigation• Visual editor
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Conclusion
For a large set of information fusion problems, it is possible to shortcut the ‘humongous knowledge engineering challenge’, by automatically generating a usable Bayesian Network from English like qualitative statements (and a few numbers).
The resulting Bayesian Network is immediately useful, and can be a start point for further knowledge refinement.
COMPANY PROPRIETARY INFORMATION
Haystax Advanced Analytics Lab
Thank You
Contact us: [email protected]
Visit us: www.haystax.com
8251 Greensboro Drive, Suite 1111
McLean, VA 22012