cognitive support for intelligent survivability management

Cognitive Support for Intelligent Cognitive Support for Intelligent Survivability ManagementSurvivability Management

Dec 18, 2007

2

OutlineOutline

• Project Summary and Progress Report– Goals/Objectives– Changes– Current status

• Technical Details of Ongoing Tasks– Event Interpretation– Response Selection– Rapid Response (ILC)– Learning Augmentation– Simulated test bed (simulator)

• Next steps– Development and Integration– Red team evaluation

OLC

Project Summary Project Summary & &

Progress ReportProgress Report

Partha Pal

4

BackgroundBackground

• Outcome of DARPA OASIS Dem/Val program– Survivability architecture

• Protection, detection and reaction (defense mechanisms)

• Synergistic organization of overlapping defense & functionality

– Demonstrated in the context of an AFRL exemplar (JBI)

– With knowledge about the architecture, human defenders can be highly effective

• Even against a sophisticated adversary with significant inside access and privilege (learning exercise runs)

Survivability architecture provides the dials and knobs, but an intelligent control loop –in the form of human experts—was needed for managing them

What was this knowledge? How did the human defenders use it? Can the intelligent control loop be automated?

Managing: Making effective decisions

5

Incentives and ObstaclesIncentives and Obstacles

• Incentives– Narrowing of the qualitative

gap in “automated” cyber-defense decision making

– Self managed survivability architecture

• Self-regenerative systems

– Next generation of adaptive system technology

• From hard-coded adaptation rules to cognitive rules to evolutionary

• Obstacles (at various levels)– Concept: insight (sort of)– but

no formalization

– Implementation: Architecture, Tool capability & choice

– Evaluation: How to create a easonably complex context & wide range of incidents

• Real system?

– Evaluation: how to quantify and validate

• Usefulness and effectiveness

• Measuring technological advancement

6

CSISM ObjectivesCSISM Objectives

• Design and implement an automated cyber-defense decision making mechanism– Expert level proficiency– Drive a typical defense enabled system – Effective, reusable, easy to port and retarget

• Evaluate it in a wider context and scope– Nature and type of events and observations– Size and complexity of the system

• Readiness for a real system context– Understanding the residual issues & challenges

7

Main ProblemMain Problem

• Making sense of low-level information (alerts, observations) to drive low-level defense-mechanisms (block, isolate etc.) such that higher-level objectives (survive, continue to operate) are achieved

• Doing it as well as human experts – And also as well as in other disciplines

• Additional difficulties– Rapid and real time decision-making and response

– Uncertainty due to incomplete and imperfect information

– Widely varying operating conditions (no alerts to 100s of alerts per second)

– New symptoms and changes in adversary’s strategy

8

For Example….For Example….

• Consider a missing protocol message alert– Observable: a system specific alert

• A accuse B of omission– Interpretation

• A is not dead (he reported it)• Is A lying? (corrupt)• B is dead• B is not dead, just behaving badly (corrupt)• A and B cannot communicate

– Refinement (depending on what else we know about the system, the attacker objective..)• Other communications between A and B• A svc is dead if the host is dead..• OS platform and likelihood of multi-platform exploits..

– Response selection• Now or later?• Many options

– dead(svc) => restart (svc) | restart (host of svc)– cannot-communicate (host1, host2)=> ping | retry operation..– corrupt(host)=> reboot(host)| block(host)| quarantine(host)..

• Now consider a large number of hosts, sequence of alerts, various adversary objectives, and trying to keep the mission going

related related

9

ApproachApproach

Interpret

Respond

ReactLearn

Stream of events and observations

Actions Hyp

oth

eses

Modify parameter or policy

Sys

tem

• Multiple levels of reasoning • Varying spatial and temporal scope• Different techniques

• The main control loop is partitioned into 2 main parts: event interpretation & response selection

ReactReact

10

Concrete GoalsConcrete Goals

• A working prototype integrating– Policy based reactive (cyber-defense) response– “Cognitive” control loop for system-wide (cyber-defense) event

interpretation and response– Learning augmentation to modify defense parameters and policies

• Achieve expert level proficiency– In making appropriate cyber-defense decision

• Evaluation by– “ground truth”: ODV operator responses to symptoms caused by red

team– Program metrics

11

Current State Current State

• Accomplished quite a bit in 1 year– KR and reasoning framework for handling cyber-defense events well

developed– Proof of concept capability demonstrated for various components at

multiple levels• OLC, ILC, Learner and Simulator• E.g., Prover9, Soar (various iterations)

– Began integration and tackling incidental issues– Evaluation ongoing (internal + external)

• Slightly behind in terms of response implementation and integration– Various reasons (inherent complexity, and the fact that it is very hard

to debug the reasoning mechanism)• Longer term issue: Confidence in such cognitive engine? Is a system-wide

scope really tenable? Is it possible to build better debugging support?– Taken mitigating steps (see next)

12

Significant Changes Significant Changes

Reason about info. flow:Refine the interpretation by considering the potential sources of omission or corruption implied in the accusation.

Reason about bad behavior: Create initial baseline interpretation of the reported event and observation-- one entity in the system accuses another

Reason about attacker goal: Further refinement - reduce the potential set of failures & corruptions by considering attacker objectives & assumptions Reason about the context:

Additional refinement –eliminate candidate failures and corruptions by considering current scenario or workflow state

Intermediatecandidate

hypotheses

Conditional jump to response selection

Hypotheses: potential conditions explaining

observed state

Event reports and observations

Even

t Int

erpr

etat

ion

Even

t Int

erpr

etat

ion

Res

pons

e Se

lect

ion

Res

pons

e Se

lect

ion

Match responses for the candidate hypotheses

Select responses providing most utility

Look ahead fixed no of steps for possible adversary counter-response


hypotheses


hypotheses

Intermediatecandidate responses


Conditional jump to response engagement Response selected

for execution





Reason about attacker goal: Further refinement - reduce the potential set of failures & corruptions by considering attacker objectives & assumptions

Reason about attacker goal: Further refinement - reduce the potential set of failures & corruptions by considering attacker objectives & assumptions Reason about the context:

Additional refinement –eliminate candidate failures and corruptions by considering current scenario or workflow state

Reason about the context: Additional refinement –eliminate candidate failures and corruptions by considering current scenario or workflow state


hypotheses

Conditional jump to response selection

Hypotheses: potential conditions explaining

observed state

Event reports and observations

Even

t Int

erpr

etat

ion

Even

t Int

erpr

etat

ion

Res

pons

e Se

lect

ion

Res

pons

e Se

lect

ion



Select responses providing most utilitySelect responses providing most utility




hypotheses


hypotheses



Conditional jump to response engagement Response selected

for execution

Translation&

Map Down

accusationevidence

Process Accusation & Evidence

Constraint

network

Prune (Coherence and proof)

Build

Knowledge about bad behavior (bin 1) , protocols and scenarios (bin 4)

Knowledge about info flow (bin 2) and protocols and scenarios (bin 4)

Knowledge about attacker goal (bin3)

Recall the linear flow using various types of knowledge? That was what we were planning in June. This evolved, and the actual flow looks like the following:

Alerts and observations

Garbage collect

Refine

13

Significant Changes (contd) Significant Changes (contd)

• Response mechanism– Do in Jess/Java instead of Soar– Issues

• Get the state accessible to Jess/Java

• Viewers– Dual purpose– usability and debugging– Was: Rule driven– write a Soar rule to produce

what to display– Now: get the state from Soar and process

14

Schedule Schedule

• Midterm release (Aug 2007) [done]

• Red team visit (Early 2008)

• Next release (Feb 2008)

• Code freeze (April 2008)

• Red team exercises (May/June 2008)

Event Interpretation and Response Event Interpretation and Response

(OLC)(OLC)

Franklin Webber

16

OLC Overall GoalsOLC Overall Goals

• Interpret alerts and observations– (sometimes lack of observations triggers alerts)

• Find appropriate response– (sometimes it may decide that no response is

necessary)

• Housekeep– Keep history– Clean up

17

OLC ComponentsOLC Components

EventInterpretation

ResponseSelection

Summary

History

Learning

responsesaccusationsand

evidence

18

Event InterpretationEvent Interpretation

Main Objectives: • Essential Event Interpretation

– Interpreting events in terms of hypotheses and models• Uses deduction and coherence to decide which hypotheses are

candidates for response

• Incidental Undertakings– Protecting the interpretation mechanisms from attack:

flooding and resource consumption

– Current status and plans• Note that items with a * are in progress

Event interpretation creates candidate hypotheses which can be responded to.

19

Event Interpretation Decision FlowEvent Interpretation Decision Flow

Response Selection

Event Interpretation

history

theoremproving

coherence

Summary

claims

dilemmas

generator

hypotheses

learning

20

Knowledge RepresentationKnowledge Representation

• Turn very specific knowledge into an intermediate form amenable to reasoning

– i.e. “Q2SM sent a malformed Spread Message” ->

“Q2SM is Corrupt”

Specific system inputs are translated into a reusable intermediate form which is used for reasoning.

• Create a graph of inputs and intermediate states to enable reasoning about the whole system

– Accusations and Evidence

– Hypotheses

– Constraints between A and B

• Use the graph to enable deduction via proof and to perform a coherence search

21

Preparing to ReasonPreparing to Reason

• Observations and Alerts are transformed to Accusations and Evidence– Currently translation is done in Soar but may move outside

to keep the translation and reasoning separate*

Alerts and Observations are turned into Accusations and Evidence that can be reasoned about.

Alerts: notification of an anomalous event

Evidence: generic observation

Accusations: generic alert

Observation: notification of an expected event

22

Alerts and AccusationsAlerts and Accusations

• By using accusations the universe of bad behavior used in reasoning is limited, with limited loss of fidelity.

• The five accusations below are representative of attacks in the system– Value: accused sent malformed data

– Policy: accused violated a security policy

– Timing: accused send well-formed data at the wrong time

– Omission: expected data was never received from accused

– Flood: accused is sending much more data than expected

CSISM uses 5 types of accusations to reason about a potentially infinite number of bad actions that could be reported.

23

Evidence*Evidence*

• While accusations show unexpected behavior evidence is used for expected behavior

• Evidence limits the universe of expected behavior used in reasoning, with limited loss of fidelity.– Alive: The subject is alive

– Timely: The subject participated in a timely exchange of information

• Specific “historical” data about interactions is used by the OLC, just not in event interpretation

CSISM uses two types of evidence to represent the occurrence of expected actions for event interpretation.

24

HypothesesHypotheses

• When an accusation is created a set of hypotheses are proposed that explain the accusation– For example a value accusation means either the accuser or

accused is corrupt and that the accuser is not dead.

• The following hypotheses (both positive and negative) can be proposed– Dead: Subject is dead; fail-stop failure– Corrupt: Subject is corrupt– Communication-Broken: Subject has lost connectivity– Flooded: Subject is starved of critical resources– OR: a meta-hypothesis that either of a number of related

hypotheses are true

Accusations lead to hypotheses about the cause of the accusation.

25

Reasoning StructureReasoning Structure

• Hypotheses, Accusations, and Evidence are connected using constraints

• The resulting graph is used for– Coherence search– Proving system facts

A graph is created to enable reasoning about hypotheses.

accusation

OR

hostdead

hostdead

commbroken

hostcorrupt

100

-100

100

-400 -400

-400

100

100 100

26

Proofs about the SystemProofs about the System

• The OLC needs to derive as much certain information as it can, but it needs to do this very quickly. The OLC does model-theoretic reasoning to find hypotheses that are theorems (i.e., always true) or necessarily false

• For example, it can assume the attacker has a single platform exploit, and consider each platform in turn, finding which hypotheses are true or false in all cases. Then it can assume the attacker has exploits for two platforms and repeat the process

• A hypothesis can be proven true or proven false or have an unknown proof status

• Claims: Hypotheses that are proven true

“Claims” are definite candidates for response

27

CoherenceCoherence

• Coherence partitions the system into clusters that make sense together– For example, for a single accusation either the

accuser or the accused may be corrupt but these hypotheses will cluster apart

• Responses can be made on the basis of the partition, or partition membership when a proof is not available*

In the absence of provable information coherence may enable actions to be taken.

28

Protection and CleanupProtection and Cleanup

• Without oversight resources can be overwhelmed– Due to flooding: we rate limit incoming messages *

– Excessive information accumulation• We take two approaches to mitigate excessive information

accumulation*– Removing outdated information by making it inactive

• If some remedial action has cleared up a past problem• If new information makes previous information outdated or redundant• If old information contradicts new information• If an inconsistency occurs we remove low-confidence information

until the inconsistency is removed– When resources are very constrained more drastic measures

are taken• Hypotheses that have not been acted upon for some time will be

removed, along with related accusations

Resources are reclaimed and managed to prevent uncontrolled data loss or corruption.

29

Current Status and Future PlansCurrent Status and Future Plans

• Knowledge Representation– Accusation translation is implemented

• May need to change to better align with the evidences

– Evidence implementation in process• Will leverage the code and structure for accusation generation

– Use of coherence partition in response selection--ongoing

• Protection and Cleanup are being implemented– Flood control development is ongoing

– The active/inactive distinction is designed and ready to implement

– Drastic hypothesis removal is still being designed

Much work has been accomplished, work still remains.

30

Response SelectionResponse Selection

• Decide promptly how to react to an attack

• Block the attack in most situations

• Make “gaming” the system difficult– Reaction based on high-confidence event

interpretation– History of responses is taken into account when

selecting next response– Not necessarily deterministic

Main Objectives:

31

Response Selection Decision FlowResponse Selection Decision Flow

Response Selection

propose

prune

EventInterpretation

responses

potentiallyusefulresponses

history

Summary

claims

dilemmas

learning

32

Response TerminologyResponse Terminology

• A response is an abstract OLC action, described generically– Example: quarantine(X), where X could be a host, file, process, memory

segment, network segment etc. • A response will be carried out in a sequence of response steps

– Steps for quarantine(X) && isHost(X) include• Reconfigure process protection domains on X• Reconfigure firewall local to X• Reconfigure firewalls remote to X

– Steps for quarantine(X) && isFile(X) include• Mark file non-exectuable• Take specimen then delete

• A command is the input to actuators that implement a single response step

– Use “/sbin/iptables” to reconfigure software firewalls– Use ADF Policy Server commands to reconfigure ADF cards– Use tripwire commands to scan file systems

Resp1

Step1 Step2 Step3

and

Cmd1 Cmd1 Cmd1

specialization

Resp2

or

33

Kinds of ResponseKinds of Response

• Refresh – e.g., start from checkpoint

• Reset – e.g., start from scratch

• Isolate -- permanent

• Quarantine/unquarantine -- temporary

• Downgrade/upgrade – services and resources

• Ping – check liveness

• Move – migrate component

The DPASA design used all of these except ‘move’.The OLC design has similar emphasis.

34

Response Selection PhasesResponse Selection Phases

• Phase I: propose– Set of claims (hypotheses that are likely true) implies

set of possibly useful responses

• Phase II: prune– Discard lower priority– Discard based on history– Discard based on lookahead– Choose between incompatible alternatives– Choose unpredictably if possible

• Learning algorithm will tune Phase II parameters

35

ExampleExample

• Event interpretation claims “Q1PSQ is corrupt”• Relevant knowledge:

– PSQ is not checkpointable• Propose:

– (A) Reset Q1PSQ, i.e., reboot, or– (B) Quarantine Q1PSQ using firewall, or– (C) Isolate Quad 1

• Prune:– Reboot has already been tried, so discard (A)– Q1PSQ is not critical, so no need to discard (B)– Prefer (B) to (C) because more easily reversible, but override

if too many previous anomalies in Quad 1• Learning

– Modify the definition of “too many” used when pruning (B)

36

Using Lookahead for PruningUsing Lookahead for Pruning

• Event interpretation provides an intelligent guess about the attacker’s capability

• OLC rules encode knowledge about attacker’s possible goals

• Lookahead estimates the potential future state, given assumptions about capability, goals, and response selection

• If response X has better potential future than Y, favor X

37

StatusStatus

• Design– Rules for proposing responses encoded in first-

order logic– Corresponding pruning rules described in English

• Implementation– Mounting responses for given hypotheses

prototyped in Soar– Actual response logic is being moved outside Soar

• Risk mitigation step

– Some specific to a particular Learning Exercise runMuch less complete than Event Interpretation, but we are getting there…

Fast Containment Response Fast Containment Response and Policiesand Policies

Michael Atighetchi(On behalf of Steve Harp)

39

Inner Loop Controller (ILC) ObjectivesInner Loop Controller (ILC) Objectives

• Policy Driven: Implement policies and tactics from OLC on a single host.

• Autonomous: high speed responsecan work when disconnected from the OLC by an attack or failure

• Flexible: Policies can be updated at any time

• Adaptive: Use learned characteristics of host and monitored services to tune the policy.

• Low impact on mission: able to back out of defensive decisions when warranted

Policy DB

Chk Pt DB

HW/OS Watchdog

AppController

AppFactory

ILC

App1

App2

Outer Loop ControlRemote App

policy layer

sensorsactuators

Control Data

instantiate

Policy DB

Chk Pt DB

HW/OS Watchdog

AppController

AppFactory

ILC

App1

App2App2

Outer Loop ControlRemote App

policy layer

sensorsactuators

Control Data

instantiate

Goal: Attempt to contain and correct problems at the earliest stage possible

40

Current ILC PoliciesCurrent ILC Policies

Generic...• Excessive resource

consumption– Memory, CPU load

• Tampering with component files

• Unexpected service termination

• Unexpected network connections

Specialized...

• JBI client registration failure

• ADF Agent failure modes

• SELinux enforcement status

41

Policy Example Policy Example

File Integrity Check Fails, F1

Local File F1: status=corrupted

File Tampering Problem

File Cleanup Subgoal

Block Network

Suspend Affected Services

Quarantine

Restore File

Remove Block/Suspend Reasons

42

ILC : Recent WorkILC : Recent Work

• ILC Configuration– Made this driven by (ontology) model of DPASA– No need for configuration files for each host– ILC runs on each simulated host, automatically

loads required policy components

• Integration– Hitherto ILC has been (unit) tested as stand-alone – Initial integration with jess DPASA simulator

complete, broadening support for multiple policies – Adjustments to API to match simulator

43

ILC : Current StatusILC : Current Status

• ILC policy to handle various applications• Model driven configuration• Metrics

– Rules: 94; Functions: 134; Frames: 24; Globals: 20

– Base reaction time (in unit test): ~ 4 ms.• (Measuring the inference part only.) • Target reaction time is: < 100 ms.

44

ILC : Ongoing WorkILC : Ongoing Work

• Complete integration with the rest of CSISM framework:– DPASA Simulator– ILC---OLC Interaction

• Designed; integration: TBD

• Testing– Verify correct reactions in simulator to various

simulated attacks– Measure reaction times

Learning AugmentationLearning Augmentation

Michael Atighetchi

(On behalf of Karen Haigh)

46

Learning Augmentation: MotivationLearning Augmentation: Motivation

• Why learning?– Extremely difficult to capture all the complexities of the

system, particularly interactions among activities– The system is dynamic (static configuration gets out of

date)• Core Challenge:

Adaptation is the key to survival

Offline Training+ Good data

+ Complex environment- Dynamic system

Online Training- Unknown data

+ Complex environment+ Dynamic system

CSISM’s Experimental Sandbox+ Good data (self-labeled)+ Complex environment

+ Dynamic system

Very hard for adversary to “train” the learner!!!

Human+ Good data

- Complex environment- Dynamic system

Sandbox approach successfully tried in

SRS phase 1

47

Development Plan for Learning in CSISMDevelopment Plan for Learning in CSISM

1. Responses under normal conditions (Calibration)

• Important first step because it learns how to respond to normal conditions

• Showed at June PI meeting

2. Situation-dependent responses under attack conditions

3. Multi-stage attacks• Since June

48

Beta=0.0005

Calibration Results for Calibration Results for allall Registration times Registration times

These two “shoulder” points indicate upper and lower limits.

As more observations are collected, the estimates become more confident of the

range of expected values (i.e. tighter estimates to observations)

June07 PI meeting

49

Multistage AttacksMultistage Attacks

• Multistage attacks involve a sequence of actions that span multiple hosts and take multiple steps to succeed.– A sequence of actions with causal relationships.– An action A must occur set up the initial conditions for action B.

Action B would have no effect without previously executing action A.

• Challenge: identify which observations indicate the necessary and sufficient elements of an attack (credit assignment). – Incidental observations that are either

• side effects of normal operations, or• chaff explicitly added by an attacker to divert the defender.

– Concealment (e.g. to remove evidence)– Probabilistic actions (e.g. to improve probability of success)

Not yet

50

Architectural Schema for Learning ofArchitectural Schema for Learning ofAttack Theories and Situation-Dependent Responses Attack Theories and Situation-Dependent Responses

CSISM Sensors(ILC, IDS)

Observations ending in failureof protected system.Only some are essential.

1 2 3 4 5 6

Defense Measures

Experimenter

A B CX ?

Viable AttackTheories

Viable Defense Strategies and

Detection Rules

Attack TheoryExperimenter

1 2 3 4 6

A B D

5

C

A B C

“Sandbox”

A CB C

A B D

XX

ObservationsActions

Failure

51

Multi-Stage LearnerMulti-Stage Learner

• Do {– Generate Theory according to heuristic

• Complete set of theories is all permutations of all members of Powerset( observations )

– Test Theory– Incrementally Update OLC / ILC rulebase

• } while Theories remain

The hard part!

52

Heuristics & Structure of ResultsHeuristics & Structure of Results

• Primary Goal: Find all shortest valid attacks (i.e. minimum required subset) as soon as possible– Example: In ABCDE, AC and DE may both be valid

• Secondary Goal: Find all valid attacks as soon as possible– Example: In ABCDE, ABC may also be valid

• Heuristics– Shortest first– Longest first– Edit distance to original– Dynamic resort to valid set

• Initially, edit distance to the original attack• Remaining theories are compared to all valid attacks; edit distance is

averaged– Dynamic Resort / Free to remove “chaff”

• Same as “Dynamic Resort to valid set”, but cost of deletion is zero

• Worst Case Comparison: Sort theories so that– Shortest valid attack is found last– All valid attacks at the end

53

Comparison of HeuristicsComparison of Heuristics

4-observation; 3-stage attack4 obs = 64 potential trials

10 obs = 10 million potential trials

54

Incremental Hypothesis GenerationIncremental Hypothesis Generation

• Enhanced query learner generates attack hypotheses– incrementally, with low memory overhead it is able

to explore large observation spaces (>>8 steps)– in heuristic order to acquire the concept rapidly

• Heuristic bias:– look for shorter attacks first (adjustable prior)– suspect order of steps has an influence– suspect steps to interact positively (for the attacker)– performance comparable to edit-dist+length

56

Status, Development Plan & Future stepsStatus, Development Plan & Future steps

June07 PI Meeting

1. Responses under normal conditions (Calibration)a. Analyze DPASA data

(done)

b. Integrate with ILC (single node) (done)

c. Add experimentation sandbox (single-node)

d. Calibrate across nodes

2. Situation-dependent responses under attack conditions

3. Multi-stage attacks

Since June1. Development of sandbox, and

initial integration efforts with learner (done)

• Attack actions, observations, and control actions

• Quality signal

2. Development of multistage algorithm (version 1.0 done)

• Theories with sandbox• Incremental generation of

theories• TODO: ILC input / OLC

output

Simulated TestbedSimulated Testbed

Michael Atighetchi

(on behalf of Michael Atighetchi)

58

Why Simulation ?Why Simulation ?

ClientHost

SD LC

Reg

Linux

Vera

ADF

CiscoPIX

Cisco3750

CiscoPIX

LC

Reg

Windows

Vera

ADF

Access ProxyHost

CSASeLinux LSMX X X

physical CAT5 wiring

DCProxPSQProxCorProxTsProx


a) Path Topology between Client and Core

Base Server

SD ILCReg FSI

b) Virtualization of a)

PPD

ILC

RegFSI



PPD

customHardware

real components forCSISM logic

connectivitydata flow

control flow

FWOS VPN WAN VPN FW OS

virtualmechanism

Defense-Enabled JBIas tested under OASIS Dem/Val

Simulation of defense-enabled system

•Use a specification

•Use as integration middleware

•Use for red team experimentation

59

JessSim: The JESS SimulatorJessSim: The JESS Simulator

World Facts

Cross-Object Inference

Application-level protocols

Host

NIC

C3750

SELinux

CSA

Tripwire

LC

SM

... ...

can-talk

mech-on-path Inferred Facts

register-client

publish-IOState Facts

Generatedvia Protégé plugin

Implemented via JESS rules and functions

60

JessSim Current Status JessSim Current Status

Implemented Protocols (#=14)Plumbing (5 rules)Alert (6 rules)Registration (8 rules)SELinux (1 rule)Reboot (3 rules)LC message (3 rules)ADF (3 rules)Heartbeat (1 rule)PSQ (3 rules)Tripwire (3 rules)ServiceControl (1 rule)POSIX Signals (1 rule)Process Memory/CPU status (2 rules)Host Memory/CPU status (2 rules)

Implemented Attacks (#=8)Avail: Disable SELinux serviceAvail: Shutdown a hostAvail: Cause a Downstream Controller to crashAvail: Cause corruption of endpoint references in SMs Avail: Killing of processes via kill -9Integrity: Corruption of filesPolicy Violation: Creation of a new (rogue) processAvail: Causing a process to overload CPU

• Test Coverage: – Unit tests: 28 junit tests covering protocol details– OASIS Dem/Val: Main events of DPASA Run #6.

• Fidelity– Focused on application-level protocols

61

JessSim Ongoing Work JessSim Ongoing Work

• Increase fidelity of network simulation– checks for network connectivity [ crash(router) => com broken (A,B)]– Simulation of TCP/IP flows for ILC

• Increase fidelity of host simulation for ILC– install-network-block / remove-network-block– note-network-connection / reset-network-connection– quarantine-file / restore-file / delete-file / checkpoint-file– note-selinux-down / note-selinux-up– shun-network-address / unshun-network-address– Enable-interface / disable-interface– Set-boot-option

• Protocols for ILC/OLC communication– forward-to-olc()

• Cleanup

– Convert all time unit to seconds in all scenarios

Next Steps: Integration and Next Steps: Integration and EvaluationEvaluation

Partha Pal

63

Learning IntegrationLearning Integration

• ILC –Learning:– Pre-deployment calibration: learn threshold

parameters for registration times– Calibrate across nodes

• OLC-Learning– Results from learning with the experimentation

sand box– Parameter tuning– New rules/heuristics

64

ILC <-> OLC IntegrationILC <-> OLC Integration

• ILC->OLC– Calls to OLC implemented in ILC policies via

calls to ilc-api– ILC as an Informant to OLC– ILC as Henchman of OLC

• OLC->ILC– OLC can process alerts forwarded to it from the

ILC– Consider ILC as a mechanism during response

selection

65

JessSim IntegrationJessSim Integration

• ILC integration with jessSim– “ArrestRunawayProcess” loop working– Implement file, network, and reboot protocols necessary to

support other existing ILC loops• OLC integration with jessSim

– OLC fully integrated with jessSim– Adjust integration given changes due to

• Moving transcription logic Alerts->Accusation, Observations->Evidence into Jess

• Performing response selection in Jess

• Integration Framework– All components execute within a single JVM– Support execution of ILC and OLC on dedicated hosts to

measure timeliness.

66

Integration FrameworkIntegration FrameworkCurrent StatusCurrent Status

Simulator Host

JVM

Q1APHost

Q1SMHost

HPSwitch

Cisco3750

Q1DCHost

Q1PSQHost

Q1PSHost

AODB Host

ADF Nic

ILC

ILC

OLC Host

OLC

OLC

alert alertcommand

1

2 3

corruptfile

Attacker

45

6

7

simulated real code

observation observation

8

command

67

Integration FrameworkIntegration FrameworkNeeded for Red Team ExperimentationNeeded for Red Team Experimentation

Simulator Host

ILC Host

JVM

Q1APHost

Q1SMHost

HPSwitch

Cisco3750

Q1DCHost

Q1PSQHost

Q1PSHost

AODB Host

ADF Nic

JVM

ILC

ILC

JVM

OLC Host

OLC

OLC

alert alertcommand

1

2 3

corruptfile

Attacker

45

6

7

simulated real code

observation observation

8

command

OLC Host

68

EvaluationEvaluation

• Interaction with Red and White Teams– Initial telecon (Late October)

– Continued technical interchange about CSISM capabilities

– Potential gaps/disagreements• How to use the simulator

• Evaluation goals

– Next steps• Demonstration of the system

• Red team visit

• Code drop

cognitive support for intelligent survivability management

Documents

system effective

typical defense

system specific alert

human defenders

intelligent control

response uncertainty

response selectionnow

cognitive rules