cognitive support for intelligent survivability management
DESCRIPTION
Cognitive Support for Intelligent Survivability Management. Dec 18, 2007. Outline. Project Summary and Progress Report Goals/Objectives Changes Current status Technical Details of Ongoing Tasks Event Interpretation Response Selection Rapid Response (ILC) Learning Augmentation - PowerPoint PPT PresentationTRANSCRIPT
Cognitive Support for Intelligent Cognitive Support for Intelligent Survivability ManagementSurvivability Management
Dec 18, 2007
2
OutlineOutline
• Project Summary and Progress Report– Goals/Objectives– Changes– Current status
• Technical Details of Ongoing Tasks– Event Interpretation– Response Selection– Rapid Response (ILC)– Learning Augmentation– Simulated test bed (simulator)
• Next steps– Development and Integration– Red team evaluation
OLC
Project Summary Project Summary & &
Progress ReportProgress Report
Partha Pal
4
BackgroundBackground
• Outcome of DARPA OASIS Dem/Val program– Survivability architecture
• Protection, detection and reaction (defense mechanisms)
• Synergistic organization of overlapping defense & functionality
– Demonstrated in the context of an AFRL exemplar (JBI)
– With knowledge about the architecture, human defenders can be highly effective
• Even against a sophisticated adversary with significant inside access and privilege (learning exercise runs)
Survivability architecture provides the dials and knobs, but an intelligent control loop –in the form of human experts—was needed for managing them
What was this knowledge? How did the human defenders use it? Can the intelligent control loop be automated?
Managing: Making effective decisions
5
Incentives and ObstaclesIncentives and Obstacles
• Incentives– Narrowing of the qualitative
gap in “automated” cyber-defense decision making
– Self managed survivability architecture
• Self-regenerative systems
– Next generation of adaptive system technology
• From hard-coded adaptation rules to cognitive rules to evolutionary
• Obstacles (at various levels)– Concept: insight (sort of)– but
no formalization
– Implementation: Architecture, Tool capability & choice
– Evaluation: How to create a easonably complex context & wide range of incidents
• Real system?
– Evaluation: how to quantify and validate
• Usefulness and effectiveness
• Measuring technological advancement
6
CSISM ObjectivesCSISM Objectives
• Design and implement an automated cyber-defense decision making mechanism– Expert level proficiency– Drive a typical defense enabled system – Effective, reusable, easy to port and retarget
• Evaluate it in a wider context and scope– Nature and type of events and observations– Size and complexity of the system
• Readiness for a real system context– Understanding the residual issues & challenges
7
Main ProblemMain Problem
• Making sense of low-level information (alerts, observations) to drive low-level defense-mechanisms (block, isolate etc.) such that higher-level objectives (survive, continue to operate) are achieved
• Doing it as well as human experts – And also as well as in other disciplines
• Additional difficulties– Rapid and real time decision-making and response
– Uncertainty due to incomplete and imperfect information
– Widely varying operating conditions (no alerts to 100s of alerts per second)
– New symptoms and changes in adversary’s strategy
8
For Example….For Example….
• Consider a missing protocol message alert– Observable: a system specific alert
• A accuse B of omission– Interpretation
• A is not dead (he reported it)• Is A lying? (corrupt)• B is dead• B is not dead, just behaving badly (corrupt)• A and B cannot communicate
– Refinement (depending on what else we know about the system, the attacker objective..)• Other communications between A and B• A svc is dead if the host is dead..• OS platform and likelihood of multi-platform exploits..
– Response selection• Now or later?• Many options
– dead(svc) => restart (svc) | restart (host of svc)– cannot-communicate (host1, host2)=> ping | retry operation..– corrupt(host)=> reboot(host)| block(host)| quarantine(host)..
• Now consider a large number of hosts, sequence of alerts, various adversary objectives, and trying to keep the mission going
related related
9
ApproachApproach
Interpret
Respond
ReactLearn
Stream of events and observations
Actions Hyp
oth
eses
Modify parameter or policy
Sys
tem
• Multiple levels of reasoning • Varying spatial and temporal scope• Different techniques
• The main control loop is partitioned into 2 main parts: event interpretation & response selection
ReactReact
10
Concrete GoalsConcrete Goals
• A working prototype integrating– Policy based reactive (cyber-defense) response– “Cognitive” control loop for system-wide (cyber-defense) event
interpretation and response– Learning augmentation to modify defense parameters and policies
• Achieve expert level proficiency– In making appropriate cyber-defense decision
• Evaluation by– “ground truth”: ODV operator responses to symptoms caused by red
team– Program metrics
11
Current State Current State
• Accomplished quite a bit in 1 year– KR and reasoning framework for handling cyber-defense events well
developed– Proof of concept capability demonstrated for various components at
multiple levels• OLC, ILC, Learner and Simulator• E.g., Prover9, Soar (various iterations)
– Began integration and tackling incidental issues– Evaluation ongoing (internal + external)
• Slightly behind in terms of response implementation and integration– Various reasons (inherent complexity, and the fact that it is very hard
to debug the reasoning mechanism)• Longer term issue: Confidence in such cognitive engine? Is a system-wide
scope really tenable? Is it possible to build better debugging support?– Taken mitigating steps (see next)
12
Significant Changes Significant Changes
Reason about info. flow:Refine the interpretation by considering the potential sources of omission or corruption implied in the accusation.
Reason about bad behavior: Create initial baseline interpretation of the reported event and observation-- one entity in the system accuses another
Reason about attacker goal: Further refinement - reduce the potential set of failures & corruptions by considering attacker objectives & assumptions Reason about the context:
Additional refinement –eliminate candidate failures and corruptions by considering current scenario or workflow state
Intermediatecandidate
hypotheses
Conditional jump to response selection
Hypotheses: potential conditions explaining
observed state
Event reports and observations
Even
t Int
erpr
etat
ion
Even
t Int
erpr
etat
ion
Res
pons
e Se
lect
ion
Res
pons
e Se
lect
ion
Match responses for the candidate hypotheses
Select responses providing most utility
Look ahead fixed no of steps for possible adversary counter-response
Intermediatecandidate
hypotheses
Intermediatecandidate
hypotheses
Intermediatecandidate responses
Intermediatecandidate responses
Conditional jump to response engagement Response selected
for execution
Reason about info. flow:Refine the interpretation by considering the potential sources of omission or corruption implied in the accusation.
Reason about info. flow:Refine the interpretation by considering the potential sources of omission or corruption implied in the accusation.
Reason about bad behavior: Create initial baseline interpretation of the reported event and observation-- one entity in the system accuses another
Reason about bad behavior: Create initial baseline interpretation of the reported event and observation-- one entity in the system accuses another
Reason about attacker goal: Further refinement - reduce the potential set of failures & corruptions by considering attacker objectives & assumptions
Reason about attacker goal: Further refinement - reduce the potential set of failures & corruptions by considering attacker objectives & assumptions Reason about the context:
Additional refinement –eliminate candidate failures and corruptions by considering current scenario or workflow state
Reason about the context: Additional refinement –eliminate candidate failures and corruptions by considering current scenario or workflow state
Intermediatecandidate
hypotheses
Conditional jump to response selection
Hypotheses: potential conditions explaining
observed state
Event reports and observations
Even
t Int
erpr
etat
ion
Even
t Int
erpr
etat
ion
Res
pons
e Se
lect
ion
Res
pons
e Se
lect
ion
Match responses for the candidate hypotheses
Match responses for the candidate hypotheses
Select responses providing most utilitySelect responses providing most utility
Look ahead fixed no of steps for possible adversary counter-response
Look ahead fixed no of steps for possible adversary counter-response
Intermediatecandidate
hypotheses
Intermediatecandidate
hypotheses
Intermediatecandidate responses
Intermediatecandidate responses
Conditional jump to response engagement Response selected
for execution
Translation&
Map Down
accusationevidence
Process Accusation & Evidence
Constraint
network
Prune (Coherence and proof)
Build
Knowledge about bad behavior (bin 1) , protocols and scenarios (bin 4)
Knowledge about info flow (bin 2) and protocols and scenarios (bin 4)
Knowledge about attacker goal (bin3)
Recall the linear flow using various types of knowledge? That was what we were planning in June. This evolved, and the actual flow looks like the following:
Alerts and observations
Garbage collect
Refine
13
Significant Changes (contd) Significant Changes (contd)
• Response mechanism– Do in Jess/Java instead of Soar– Issues
• Get the state accessible to Jess/Java
• Viewers– Dual purpose– usability and debugging– Was: Rule driven– write a Soar rule to produce
what to display– Now: get the state from Soar and process
14
Schedule Schedule
• Midterm release (Aug 2007) [done]
• Red team visit (Early 2008)
• Next release (Feb 2008)
• Code freeze (April 2008)
• Red team exercises (May/June 2008)
Event Interpretation and Response Event Interpretation and Response
(OLC)(OLC)
Franklin Webber
16
OLC Overall GoalsOLC Overall Goals
• Interpret alerts and observations– (sometimes lack of observations triggers alerts)
• Find appropriate response– (sometimes it may decide that no response is
necessary)
• Housekeep– Keep history– Clean up
17
OLC ComponentsOLC Components
EventInterpretation
ResponseSelection
Summary
History
Learning
responsesaccusationsand
evidence
18
Event InterpretationEvent Interpretation
Main Objectives: • Essential Event Interpretation
– Interpreting events in terms of hypotheses and models• Uses deduction and coherence to decide which hypotheses are
candidates for response
• Incidental Undertakings– Protecting the interpretation mechanisms from attack:
flooding and resource consumption
– Current status and plans• Note that items with a * are in progress
Event interpretation creates candidate hypotheses which can be responded to.
19
Event Interpretation Decision FlowEvent Interpretation Decision Flow
Response Selection
Event Interpretation
history
theoremproving
coherence
Summary
claims
dilemmas
generator
hypotheses
learning
20
Knowledge RepresentationKnowledge Representation
• Turn very specific knowledge into an intermediate form amenable to reasoning
– i.e. “Q2SM sent a malformed Spread Message” ->
“Q2SM is Corrupt”
Specific system inputs are translated into a reusable intermediate form which is used for reasoning.
• Create a graph of inputs and intermediate states to enable reasoning about the whole system
– Accusations and Evidence
– Hypotheses
– Constraints between A and B
• Use the graph to enable deduction via proof and to perform a coherence search
21
Preparing to ReasonPreparing to Reason
• Observations and Alerts are transformed to Accusations and Evidence– Currently translation is done in Soar but may move outside
to keep the translation and reasoning separate*
Alerts and Observations are turned into Accusations and Evidence that can be reasoned about.
Alerts: notification of an anomalous event
Evidence: generic observation
Accusations: generic alert
Observation: notification of an expected event
22
Alerts and AccusationsAlerts and Accusations
• By using accusations the universe of bad behavior used in reasoning is limited, with limited loss of fidelity.
• The five accusations below are representative of attacks in the system– Value: accused sent malformed data
– Policy: accused violated a security policy
– Timing: accused send well-formed data at the wrong time
– Omission: expected data was never received from accused
– Flood: accused is sending much more data than expected
CSISM uses 5 types of accusations to reason about a potentially infinite number of bad actions that could be reported.
23
Evidence*Evidence*
• While accusations show unexpected behavior evidence is used for expected behavior
• Evidence limits the universe of expected behavior used in reasoning, with limited loss of fidelity.– Alive: The subject is alive
– Timely: The subject participated in a timely exchange of information
• Specific “historical” data about interactions is used by the OLC, just not in event interpretation
CSISM uses two types of evidence to represent the occurrence of expected actions for event interpretation.
24
HypothesesHypotheses
• When an accusation is created a set of hypotheses are proposed that explain the accusation– For example a value accusation means either the accuser or
accused is corrupt and that the accuser is not dead.
• The following hypotheses (both positive and negative) can be proposed– Dead: Subject is dead; fail-stop failure– Corrupt: Subject is corrupt– Communication-Broken: Subject has lost connectivity– Flooded: Subject is starved of critical resources– OR: a meta-hypothesis that either of a number of related
hypotheses are true
Accusations lead to hypotheses about the cause of the accusation.
25
Reasoning StructureReasoning Structure
• Hypotheses, Accusations, and Evidence are connected using constraints
• The resulting graph is used for– Coherence search– Proving system facts
A graph is created to enable reasoning about hypotheses.
accusation
OR
hostdead
hostdead
commbroken
hostcorrupt
100
-100
100
-400 -400
-400
100
100 100
26
Proofs about the SystemProofs about the System
• The OLC needs to derive as much certain information as it can, but it needs to do this very quickly. The OLC does model-theoretic reasoning to find hypotheses that are theorems (i.e., always true) or necessarily false
• For example, it can assume the attacker has a single platform exploit, and consider each platform in turn, finding which hypotheses are true or false in all cases. Then it can assume the attacker has exploits for two platforms and repeat the process
• A hypothesis can be proven true or proven false or have an unknown proof status
• Claims: Hypotheses that are proven true
“Claims” are definite candidates for response
27
CoherenceCoherence
• Coherence partitions the system into clusters that make sense together– For example, for a single accusation either the
accuser or the accused may be corrupt but these hypotheses will cluster apart
• Responses can be made on the basis of the partition, or partition membership when a proof is not available*
In the absence of provable information coherence may enable actions to be taken.
28
Protection and CleanupProtection and Cleanup
• Without oversight resources can be overwhelmed– Due to flooding: we rate limit incoming messages *
– Excessive information accumulation• We take two approaches to mitigate excessive information
accumulation*– Removing outdated information by making it inactive
• If some remedial action has cleared up a past problem• If new information makes previous information outdated or redundant• If old information contradicts new information• If an inconsistency occurs we remove low-confidence information
until the inconsistency is removed– When resources are very constrained more drastic measures
are taken• Hypotheses that have not been acted upon for some time will be
removed, along with related accusations
Resources are reclaimed and managed to prevent uncontrolled data loss or corruption.
29
Current Status and Future PlansCurrent Status and Future Plans
• Knowledge Representation– Accusation translation is implemented
• May need to change to better align with the evidences
– Evidence implementation in process• Will leverage the code and structure for accusation generation
– Use of coherence partition in response selection--ongoing
• Protection and Cleanup are being implemented– Flood control development is ongoing
– The active/inactive distinction is designed and ready to implement
– Drastic hypothesis removal is still being designed
Much work has been accomplished, work still remains.
30
Response SelectionResponse Selection
• Decide promptly how to react to an attack
• Block the attack in most situations
• Make “gaming” the system difficult– Reaction based on high-confidence event
interpretation– History of responses is taken into account when
selecting next response– Not necessarily deterministic
Main Objectives:
31
Response Selection Decision FlowResponse Selection Decision Flow
Response Selection
propose
prune
EventInterpretation
responses
potentiallyusefulresponses
history
Summary
claims
dilemmas
learning
32
Response TerminologyResponse Terminology
• A response is an abstract OLC action, described generically– Example: quarantine(X), where X could be a host, file, process, memory
segment, network segment etc. • A response will be carried out in a sequence of response steps
– Steps for quarantine(X) && isHost(X) include• Reconfigure process protection domains on X• Reconfigure firewall local to X• Reconfigure firewalls remote to X
– Steps for quarantine(X) && isFile(X) include• Mark file non-exectuable• Take specimen then delete
• A command is the input to actuators that implement a single response step
– Use “/sbin/iptables” to reconfigure software firewalls– Use ADF Policy Server commands to reconfigure ADF cards– Use tripwire commands to scan file systems
Resp1
Step1 Step2 Step3
and
Cmd1 Cmd1 Cmd1
specialization
Resp2
or
33
Kinds of ResponseKinds of Response
• Refresh – e.g., start from checkpoint
• Reset – e.g., start from scratch
• Isolate -- permanent
• Quarantine/unquarantine -- temporary
• Downgrade/upgrade – services and resources
• Ping – check liveness
• Move – migrate component
The DPASA design used all of these except ‘move’.The OLC design has similar emphasis.
34
Response Selection PhasesResponse Selection Phases
• Phase I: propose– Set of claims (hypotheses that are likely true) implies
set of possibly useful responses
• Phase II: prune– Discard lower priority– Discard based on history– Discard based on lookahead– Choose between incompatible alternatives– Choose unpredictably if possible
• Learning algorithm will tune Phase II parameters
35
ExampleExample
• Event interpretation claims “Q1PSQ is corrupt”• Relevant knowledge:
– PSQ is not checkpointable• Propose:
– (A) Reset Q1PSQ, i.e., reboot, or– (B) Quarantine Q1PSQ using firewall, or– (C) Isolate Quad 1
• Prune:– Reboot has already been tried, so discard (A)– Q1PSQ is not critical, so no need to discard (B)– Prefer (B) to (C) because more easily reversible, but override
if too many previous anomalies in Quad 1• Learning
– Modify the definition of “too many” used when pruning (B)
36
Using Lookahead for PruningUsing Lookahead for Pruning
• Event interpretation provides an intelligent guess about the attacker’s capability
• OLC rules encode knowledge about attacker’s possible goals
• Lookahead estimates the potential future state, given assumptions about capability, goals, and response selection
• If response X has better potential future than Y, favor X
37
StatusStatus
• Design– Rules for proposing responses encoded in first-
order logic– Corresponding pruning rules described in English
• Implementation– Mounting responses for given hypotheses
prototyped in Soar– Actual response logic is being moved outside Soar
• Risk mitigation step
– Some specific to a particular Learning Exercise runMuch less complete than Event Interpretation, but we are getting there…
Fast Containment Response Fast Containment Response and Policiesand Policies
Michael Atighetchi(On behalf of Steve Harp)
39
Inner Loop Controller (ILC) ObjectivesInner Loop Controller (ILC) Objectives
• Policy Driven: Implement policies and tactics from OLC on a single host.
• Autonomous: high speed responsecan work when disconnected from the OLC by an attack or failure
• Flexible: Policies can be updated at any time
• Adaptive: Use learned characteristics of host and monitored services to tune the policy.
• Low impact on mission: able to back out of defensive decisions when warranted
Policy DB
Chk Pt DB
HW/OS Watchdog
AppController
AppFactory
ILC
App1
App2
Outer Loop ControlRemote App
policy layer
sensorsactuators
Control Data
instantiate
Policy DB
Chk Pt DB
HW/OS Watchdog
AppController
AppFactory
ILC
App1
App2App2
Outer Loop ControlRemote App
policy layer
sensorsactuators
Control Data
instantiate
Goal: Attempt to contain and correct problems at the earliest stage possible
40
Current ILC PoliciesCurrent ILC Policies
Generic...• Excessive resource
consumption– Memory, CPU load
• Tampering with component files
• Unexpected service termination
• Unexpected network connections
Specialized...
• JBI client registration failure
• ADF Agent failure modes
• SELinux enforcement status
41
Policy Example Policy Example
File Integrity Check Fails, F1
Local File F1: status=corrupted
File Tampering Problem
File Cleanup Subgoal
Block Network
Suspend Affected Services
Quarantine
Restore File
Remove Block/Suspend Reasons
42
ILC : Recent WorkILC : Recent Work
• ILC Configuration– Made this driven by (ontology) model of DPASA– No need for configuration files for each host– ILC runs on each simulated host, automatically
loads required policy components
• Integration– Hitherto ILC has been (unit) tested as stand-alone – Initial integration with jess DPASA simulator
complete, broadening support for multiple policies – Adjustments to API to match simulator
43
ILC : Current StatusILC : Current Status
• ILC policy to handle various applications• Model driven configuration• Metrics
– Rules: 94; Functions: 134; Frames: 24; Globals: 20
– Base reaction time (in unit test): ~ 4 ms.• (Measuring the inference part only.) • Target reaction time is: < 100 ms.
44
ILC : Ongoing WorkILC : Ongoing Work
• Complete integration with the rest of CSISM framework:– DPASA Simulator– ILC---OLC Interaction
• Designed; integration: TBD
• Testing– Verify correct reactions in simulator to various
simulated attacks– Measure reaction times
Learning AugmentationLearning Augmentation
Michael Atighetchi
(On behalf of Karen Haigh)
46
Learning Augmentation: MotivationLearning Augmentation: Motivation
• Why learning?– Extremely difficult to capture all the complexities of the
system, particularly interactions among activities– The system is dynamic (static configuration gets out of
date)• Core Challenge:
Adaptation is the key to survival
Offline Training+ Good data
+ Complex environment- Dynamic system
Online Training- Unknown data
+ Complex environment+ Dynamic system
CSISM’s Experimental Sandbox+ Good data (self-labeled)+ Complex environment
+ Dynamic system
Very hard for adversary to “train” the learner!!!
Human+ Good data
- Complex environment- Dynamic system
Sandbox approach successfully tried in
SRS phase 1
47
Development Plan for Learning in CSISMDevelopment Plan for Learning in CSISM
1. Responses under normal conditions (Calibration)
• Important first step because it learns how to respond to normal conditions
• Showed at June PI meeting
2. Situation-dependent responses under attack conditions
3. Multi-stage attacks• Since June
48
Beta=0.0005
Calibration Results for Calibration Results for allall Registration times Registration times
These two “shoulder” points indicate upper and lower limits.
As more observations are collected, the estimates become more confident of the
range of expected values (i.e. tighter estimates to observations)
June07 PI meeting
49
Multistage AttacksMultistage Attacks
• Multistage attacks involve a sequence of actions that span multiple hosts and take multiple steps to succeed.– A sequence of actions with causal relationships.– An action A must occur set up the initial conditions for action B.
Action B would have no effect without previously executing action A.
• Challenge: identify which observations indicate the necessary and sufficient elements of an attack (credit assignment). – Incidental observations that are either
• side effects of normal operations, or• chaff explicitly added by an attacker to divert the defender.
– Concealment (e.g. to remove evidence)– Probabilistic actions (e.g. to improve probability of success)
Not yet
50
Architectural Schema for Learning ofArchitectural Schema for Learning ofAttack Theories and Situation-Dependent Responses Attack Theories and Situation-Dependent Responses
CSISM Sensors(ILC, IDS)
Observations ending in failureof protected system.Only some are essential.
1 2 3 4 5 6
Defense Measures
Experimenter
A B CX ?
Viable AttackTheories
Viable Defense Strategies and
Detection Rules
Attack TheoryExperimenter
1 2 3 4 6
A B D
5
C
A B C
“Sandbox”
A CB C
A B D
XX
ObservationsActions
Failure
51
Multi-Stage LearnerMulti-Stage Learner
• Do {– Generate Theory according to heuristic
• Complete set of theories is all permutations of all members of Powerset( observations )
– Test Theory– Incrementally Update OLC / ILC rulebase
• } while Theories remain
The hard part!
52
Heuristics & Structure of ResultsHeuristics & Structure of Results
• Primary Goal: Find all shortest valid attacks (i.e. minimum required subset) as soon as possible– Example: In ABCDE, AC and DE may both be valid
• Secondary Goal: Find all valid attacks as soon as possible– Example: In ABCDE, ABC may also be valid
• Heuristics– Shortest first– Longest first– Edit distance to original– Dynamic resort to valid set
• Initially, edit distance to the original attack• Remaining theories are compared to all valid attacks; edit distance is
averaged– Dynamic Resort / Free to remove “chaff”
• Same as “Dynamic Resort to valid set”, but cost of deletion is zero
• Worst Case Comparison: Sort theories so that– Shortest valid attack is found last– All valid attacks at the end
53
Comparison of HeuristicsComparison of Heuristics
4-observation; 3-stage attack4 obs = 64 potential trials
10 obs = 10 million potential trials
54
Incremental Hypothesis GenerationIncremental Hypothesis Generation
• Enhanced query learner generates attack hypotheses– incrementally, with low memory overhead it is able
to explore large observation spaces (>>8 steps)– in heuristic order to acquire the concept rapidly
• Heuristic bias:– look for shorter attacks first (adjustable prior)– suspect order of steps has an influence– suspect steps to interact positively (for the attacker)– performance comparable to edit-dist+length
56
Status, Development Plan & Future stepsStatus, Development Plan & Future steps
June07 PI Meeting
1. Responses under normal conditions (Calibration)a. Analyze DPASA data
(done)
b. Integrate with ILC (single node) (done)
c. Add experimentation sandbox (single-node)
d. Calibrate across nodes
2. Situation-dependent responses under attack conditions
3. Multi-stage attacks
Since June1. Development of sandbox, and
initial integration efforts with learner (done)
• Attack actions, observations, and control actions
• Quality signal
2. Development of multistage algorithm (version 1.0 done)
• Theories with sandbox• Incremental generation of
theories• TODO: ILC input / OLC
output
Simulated TestbedSimulated Testbed
Michael Atighetchi
(on behalf of Michael Atighetchi)
58
Why Simulation ?Why Simulation ?
ClientHost
SD LC
Reg
Linux
Vera
ADF
CiscoPIX
Cisco3750
CiscoPIX
LC
Reg
Windows
Vera
ADF
Access ProxyHost
CSASeLinux LSMX X X
physical CAT5 wiring
DCProxPSQProxCorProxTsProx
DCProxPSQProxCorProxTsProx
a) Path Topology between Client and Core
Base Server
SD ILCReg FSI
b) Virtualization of a)
PPD
ILC
RegFSI
DCProxPSQProxCorProxTsProx
DCProxPSQProxCorProxTsProx
PPD
customHardware
real components forCSISM logic
connectivitydata flow
control flow
FWOS VPN WAN VPN FW OS
virtualmechanism
Defense-Enabled JBIas tested under OASIS Dem/Val
Simulation of defense-enabled system
•Use a specification
•Use as integration middleware
•Use for red team experimentation
59
JessSim: The JESS SimulatorJessSim: The JESS Simulator
World Facts
Cross-Object Inference
Application-level protocols
Host
NIC
C3750
SELinux
CSA
Tripwire
LC
SM
... ...
can-talk
mech-on-path Inferred Facts
register-client
publish-IOState Facts
Generatedvia Protégé plugin
Implemented via JESS rules and functions
60
JessSim Current Status JessSim Current Status
Implemented Protocols (#=14)Plumbing (5 rules)Alert (6 rules)Registration (8 rules)SELinux (1 rule)Reboot (3 rules)LC message (3 rules)ADF (3 rules)Heartbeat (1 rule)PSQ (3 rules)Tripwire (3 rules)ServiceControl (1 rule)POSIX Signals (1 rule)Process Memory/CPU status (2 rules)Host Memory/CPU status (2 rules)
Implemented Attacks (#=8)Avail: Disable SELinux serviceAvail: Shutdown a hostAvail: Cause a Downstream Controller to crashAvail: Cause corruption of endpoint references in SMs Avail: Killing of processes via kill -9Integrity: Corruption of filesPolicy Violation: Creation of a new (rogue) processAvail: Causing a process to overload CPU
• Test Coverage: – Unit tests: 28 junit tests covering protocol details– OASIS Dem/Val: Main events of DPASA Run #6.
• Fidelity– Focused on application-level protocols
61
JessSim Ongoing Work JessSim Ongoing Work
• Increase fidelity of network simulation– checks for network connectivity [ crash(router) => com broken (A,B)]– Simulation of TCP/IP flows for ILC
• Increase fidelity of host simulation for ILC– install-network-block / remove-network-block– note-network-connection / reset-network-connection– quarantine-file / restore-file / delete-file / checkpoint-file– note-selinux-down / note-selinux-up– shun-network-address / unshun-network-address– Enable-interface / disable-interface– Set-boot-option
• Protocols for ILC/OLC communication– forward-to-olc()
• Cleanup
– Convert all time unit to seconds in all scenarios
Next Steps: Integration and Next Steps: Integration and EvaluationEvaluation
Partha Pal
63
Learning IntegrationLearning Integration
• ILC –Learning:– Pre-deployment calibration: learn threshold
parameters for registration times– Calibrate across nodes
• OLC-Learning– Results from learning with the experimentation
sand box– Parameter tuning– New rules/heuristics
64
ILC <-> OLC IntegrationILC <-> OLC Integration
• ILC->OLC– Calls to OLC implemented in ILC policies via
calls to ilc-api– ILC as an Informant to OLC– ILC as Henchman of OLC
• OLC->ILC– OLC can process alerts forwarded to it from the
ILC– Consider ILC as a mechanism during response
selection
65
JessSim IntegrationJessSim Integration
• ILC integration with jessSim– “ArrestRunawayProcess” loop working– Implement file, network, and reboot protocols necessary to
support other existing ILC loops• OLC integration with jessSim
– OLC fully integrated with jessSim– Adjust integration given changes due to
• Moving transcription logic Alerts->Accusation, Observations->Evidence into Jess
• Performing response selection in Jess
• Integration Framework– All components execute within a single JVM– Support execution of ILC and OLC on dedicated hosts to
measure timeliness.
66
Integration FrameworkIntegration FrameworkCurrent StatusCurrent Status
Simulator Host
JVM
Q1APHost
Q1SMHost
HPSwitch
Cisco3750
Q1DCHost
Q1PSQHost
Q1PSHost
AODB Host
ADF Nic
ILC
ILC
OLC Host
OLC
OLC
alert alertcommand
1
2 3
corruptfile
Attacker
45
6
7
simulated real code
observation observation
8
command
67
Integration FrameworkIntegration FrameworkNeeded for Red Team ExperimentationNeeded for Red Team Experimentation
Simulator Host
ILC Host
JVM
Q1APHost
Q1SMHost
HPSwitch
Cisco3750
Q1DCHost
Q1PSQHost
Q1PSHost
AODB Host
ADF Nic
JVM
ILC
ILC
JVM
OLC Host
OLC
OLC
alert alertcommand
1
2 3
corruptfile
Attacker
45
6
7
simulated real code
observation observation
8
command
OLC Host
68
EvaluationEvaluation
• Interaction with Red and White Teams– Initial telecon (Late October)
– Continued technical interchange about CSISM capabilities
– Potential gaps/disagreements• How to use the simulator
• Evaluation goals
– Next steps• Demonstration of the system
• Red team visit
• Code drop