issues and ideas in software reliability for fcs joe loyall bbn technologies
TRANSCRIPT
![Page 1: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/1.jpg)
Issues and Ideas in Issues and Ideas in Software Reliability Software Reliability
for FCSfor FCS
Joe Loyall
BBN Technologies
![Page 2: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/2.jpg)
5/18/2004 Joe Loyall 2
General Issues Affecting Reliability of FCSGeneral Issues Affecting Reliability of FCS
• Size and complexity - Very large, complex systems– Many interoperating parts, developed by different people, including legacy– Unreliability of any one part can affect the system, but reliability of any one part may have little effect
on the reliability of the entire system
• Large mission requirements that decompose into distributed (and some local) requirements – Too easy to decompose poorly
• One can verify, validate, and unit test individual pieces– However, reliability of the whole is not the sum of the reliability of the parts
• Abstracting away the details can help one to understand some of the high-level design– However, putting back in the details later can put back in the complexity and the bugs
• Some things can’t be put back in later, because they are pervasive– Trying to insert some things after the fact can greatly increase the fragility of the system– QoS, security, fault tolerance are examples
• Tying too tightly to a hardware platform can lead to future brittleness; Tying too loosely can lead to bugs associated with lack of control
– Motivates the need for a middle layer
• Reliability of the system can be limited by the quality of the least capable programming group
– Motivates the need for strong processes, tools, patterns, etc.
![Page 3: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/3.jpg)
5/18/2004 Joe Loyall 3
Topic 1: Building Reliable FCS Software Topic 1: Building Reliable FCS Software with Managed Quality of Service (QoS)with Managed Quality of Service (QoS)
• Managed QoS in DRE systems is crucial– Providing managed QoS currently complicates application development significantly especially in
distributed environments– Has traditionally been handled with static provisioning– Recent research has developed the ability to handle QoS at runtime with control and adaptation
• New advances are needed to develop reliable FCS software– Can’t move backward to only static provisioning because FCS is too dynamic– Runtime QoS control, however, is only one part of software reliability
• Need to continue to build upon the advances of recent years…– Separate programming of QoS and functionality– Design-time specification and runtime enforcement of QoS– Predictable end-to-end QoS in dynamic environments– Component sized units for encapsulation, reuse, and composition
• While moving forward to support the design and implementation of reliable QoS managed FCS software
– Modeling of QoS aspects separately from, but alongside, functional and component modeling– Programming to well-defined QoS interfaces and standard protocols– Reusable encapsulated, but configurable, QoS behaviors that can be assembled with reliability– Models, tools, patterns, and processes
![Page 4: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/4.jpg)
5/18/2004 Joe Loyall 4
Area of Focus: SoS QoSArea of Focus: SoS QoS
Designing SoS must consider several dimensions of QoS• QoS for each individual end-to-end string
(SDMS/W)• QoS for multiple end-to-end application strings
competing for resources• Doing this for non-fixed, changing numbers of
application strings• Handling it dynamically, where conditions
change over time
Technologies and processes to make it feasible to handle QoS at the System of Systems (SoS) level of abstraction
• Modeling tools that support design of QoS aspects of SoS separate from, but alongside, functional components
• QoS interfaces and patterns of use that enforce managed assembly and disciplined composition of QoS and functional components (ala type checking and IDL)
• Multi-layer QoS design and management– Mission layer coordinates missions and mission-
level policies– Coordination layer manages QoS for logically or
physically related sets of components– Resource layer manages QoS for individual
resources or mechanisms
• Reusable, validated QoS components• Assembly, deployment, and configuration with
validated behavior– Validated QoS behaviors assembled into
validated patterns using enforcing interfaces
![Page 5: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/5.jpg)
5/18/2004 Joe Loyall 5
Topic 2: Processes and Methods for Topic 2: Processes and Methods for FCS Software ReliabilityFCS Software Reliability
• Modeling is important, but not a silver bullet, and can be dangerous– Models can diverge from implementation over time (is incorrect documentation worse than no
documentation?)– Models frequently are higher level, and more abstract, to capture the top-down design, but
introducing the details later introduces bugs and complexity (Need proper abstractions and correct/complete decomposition support)
– Modeling can introduce more opportunities for errors• Models can be incorrect (need for model validation)• Code synthesizers can be incorrect• Interaction with legacy or handwritten code can introduce errors
• Well-defined interfaces and “type” enforcement– Component interfaces and type enforcement have reduced many instances of common errors– With some attention and research, could the QoS, security, fault tolerance, etc equivalent be
developed
• Verification and constraint concepts might provide partial solutions to FCS reliability– Constraints and verification at many levels (higher abstract design level down through each
decomposition) – the only way to scale the idea to the size and complexity of FCS– “Proof carrying code”-like enforcement constraints for functionality and QoS, for assembly,
deployment, configuration, and runtime• Can prevent errors in some cases• Earlier detection (in the life cycle) of other problems• Aid in software correctness over the system’s lifetime
![Page 6: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/6.jpg)
5/18/2004 Joe Loyall 6
Topic 3: Open-Standards, Open-Source, Topic 3: Open-Standards, Open-Source, and Alternative Modelsand Alternative Models
• Open standards and open-source are trends that are unlikely to reverse– Economic benefits – no single vendor for a technology; longer lived technology bases
– Fewer stove-piped, one-of-a-kind systems
– Pushes the technology up• System developers can assume the existence of infrastructure and the programmers that
understand it• Enables the development of systems with greater capability because they don’t have to be built
from the ground up
– However, they make integration more important and more frequent
• Program development models that increase reliability– There are domains in which software is well-engineered and reliable
– For example, many business applications (which previously were developed by professional programmers) are developed today by domain experts (e.g., accountants) in well-established, reliable tools (e.g., spreadsheet programs)
– Are there parts of FCS SoS building that can likewise, with the proper tool support, be turned over to domain experts and what would be needed to enable it?
• Patterns of use, idioms that lead programmers to producing correct software• Modeling or other programming tool environments with domain-friendly interfaces• These tools could be highly constraining to allow production of only well-behaved, reliable
software because their focus is narrow and domain-specific
![Page 7: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/7.jpg)
5/18/2004 Joe Loyall 7
Topic 4: Certification of FCS SoS Software Topic 4: Certification of FCS SoS Software
• Certification is already a difficult issue and the highly distributed, heterogeneous, and dynamic nature of large SoS software makes certification with current processes more difficult
– However systems of greater scale, distribution, interaction, and dynamism are inevitable and need to be certified
– The nature of the systems being certified and the nature of the certification process might need to evolve simultaneously
– Certification of individual components or participants is unlikely to scale well to certification of the entire system
– Can certification of individual behaviors contribute to certification of a system that can change its behavior• We can provide techniques that support the certification of dynamic systems
– Increase the ability to certify dynamic systems by constraining their dynamism• Critical subsystems limited to dynamically choosing from a set of certified static choices
– If we can’t certify exactly correct behavior for highly dynamic systems, perhaps we can certify their limits• For example, certify that an adaptive system can do no harm; while we might not be able to certify exactly how it
can adapt, we can certify how much, or within what limits, it can adapt or that its adaptation can affect the rest of the system
– Can we certify the adaptive mechanisms that delimit behavior, recover, protect, or keep software operating within a “safe” subset of possibilities
• In a highly dynamic, distributed system even if we cannot certify that it is free from defects, perhaps it is sufficient to certify that the system would gracefully handle, recover from, or fix defects
• How do we certify the adaptive mechanisms – useful if we can presume this is simpler than certifying the full system behavior
![Page 8: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/8.jpg)
Some Additional Technical Ideas Some Additional Technical Ideas Relevant to Reliable FCS SoftwareRelevant to Reliable FCS Software
Survivability for FCS
![Page 9: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/9.jpg)
5/18/2004 Joe Loyall 9
Defense Enabling: Dynamism for SurvivabilityDefense Enabling: Dynamism for Survivability
• Survival of critical systems, as much as security, is crucial• Adaptation is essential to survive organized, malicious attack
– Tolerate and recover from failures induced by the attack– Compensate (e.g., graceful degradation) if attacker succeeds in
preventing use of required resources– Introduce artificial diversity to increase attacker work factor
• Adaptive response involves dynamic management of system resources and properties– Integration of system properties (e.g., real-time, security, dependability)
and the associated tradeoffs– Strategies for coordinated, distributed, but secure adaptation and
management• Adaptive response is supported by
– Redundancy (eliminate single point failures)– Heterogeneity (prevent common mode failures)– Uncertainty (slow staged attacks)
![Page 10: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/10.jpg)
5/18/2004 Joe Loyall 10
Architecting Survivability into FCS (and Architecting Survivability into FCS (and other SoS)other SoS)
Reliability requires architecting in multiple dimensions
Even more so, when the goal is to be resilient not only against errors, but also against attacks….
Diversity: Avoid common mode vulnerabilities
Layers of protection
Both HWand SW
Design Principles,
Architecturalconstrains
High barrier to intrusion
Adaptive response
Adaptivemiddleware
Rapid andcoordinatedresponse
Isolation, recovery,Graceful
degradation
Redundancy: No single point of failure in critical functionality
Weak assumptions
Less susceptible to attacker’s manipulation of environment
Detection and correlationEmbeddedsensors
Mix of IDSand Policy violation
Advanced, distributedcorrelation
General principles for survivability
• Protect as best as possible
• Improve chances of detection
• Adapt to manage gaps
![Page 11: Issues and Ideas in Software Reliability for FCS Joe Loyall BBN Technologies](https://reader036.vdocuments.mx/reader036/viewer/2022082819/56649de55503460f94add78d/html5/thumbnails/11.jpg)
5/18/2004 Joe Loyall 11
Use of Modeling for Validation of Use of Modeling for Validation of Integrated SurvivabilityIntegrated Survivability
PIP requirements 1 – 4
JBI survivability requirements
Initialized JBI provides essential services
Authorized publish is processed successfully
ConfidentialityDataflow
Timeliness Integrity
(from functional model execution)
Component Model Assumptions Hold
JBI intrusion detection requirements
PA1: Client-Core
Communication I & C
PA2: Alternate Path
Availability
QA1: QIS Incorruptibility
QA2: QIS Communication
Cutoff
QA3: QIS Input
Integrity
QA4: QIS Function
Correctness
AA1: AP Function
Correctness
AA2: AP Application-
layer Integrity
AA3: AP Application-layer
Confidentiality
DA1: DC Communications
SA1: IO Integrity in
PSQ Server
SA2: Client Confidentiality in PSQ Server
SA3: IO Authenticity
SA4: Network-layer I & C
SeA1: Sensor False Alarm
Rate
SeA2: Sensor Detection Delay
SeA3: Sensor Detection Probability
CoA1: Corrleator
False Alarm Rate
MA1: SM Byzantine Agreement
PsA1: ADF Policy Server
Input Correctness
PsA2: ADF Policy Server
SynchronizationSystem Connectivity
Physical Topology
Network TopologyRestricted RoutingNo Tunneling Attacks
SELinux Solaris Windows
Type Enforcement Hardened Kernel IKENA StormWatch
Platform Mechanisms Process Domain Policies
Private Key Confidentiality
No Unauthorized Direct Access
Keys Protected from Theft
DoD Common Access Card (CAC)
PKCS #11 Tamperproof
Keys Not Guessable
Algorithmic Framework
Key Length Key Lifetime
No Unauthorized Indirect Access
Physical Protection of CAC device
Protection of CAC Authentication Data
No Compromise of Authorized Process
Accessing CAC
No Cryptography in Access Proxy
Not Preconfigured
Not Reconfigurable
ADF NIC services protected
ADF Correctness
ADF NIC Physical Security
ADF NIC Firmware Initialization
ADF Key Initialization
ADF Agent Initialization
ADF Protocol Correctness
ADF Host Independence
ADF Agent Correctness
VPG Integrity VPG Confidentiality
Policy Server Integrity
ADF Policy Correctness
Correctness of Registration
Protocol
Correctness of Reattachment
Protocol
Hard-wired Configuration
Electrically Isolated
Physically Protected
Connectivity
Physical Integrity
Electrical Integrity
Gate Configuration and
Truth Table
Proxy Protocol Configuration
Can Identify Malformed Traffic
Correctness of Rate Control Mechanisms
Correctness of Certificate Exchange
IDS Experimental Evaluation
Correctness of Modified ITUA Protocols
Functional model faithful to design
IDS / Correlation requirements
IO Confidentiality (end-to-end)
IConfidentiality of Network
Communications
Confidential info is not exposed
Unauthorized activity is properly rejected
Authorized join/leave is processed successfully
Authorized query is processed
successfully
Authorized subscribe is processed successfully
JBI is properly initialized
Design Team Review
Attack Model Assumptions Hold
Functional Model Assumptions Hold
Infrastructure Attack
Propagation
Data Attack Propagation
Attacks Originate
Outside the Platform
No Data Attacks
Outside the Platform
Initial Targets of
Infrastructure Attacks
Isolation of Intruded Process Domains
Targets for Loss of IO
Confidentiality
No Compromise or Failure of
QIS
DoS Causes Processing
Delays
DoS Does Not Corrupt
Other Components
DoS Attacks Do Not
Propagate from Clients to Core
Design Faithfully
Implemented
Absence of Insider Threat
Attack Model Parameter Selection
CERT Vulnerability DB Analysis
Variation over Anticipated
Ranges
Correctness of Managed Switch
IO Confidentiality in Transit
IO Confidentiality in Storage
Confidentiality of Application-layer
Messages
PIP requirements 1 – 4
JBI survivability requirements
Initialized JBI provides essential services
Authorized publish is processed successfully
ConfidentialityDataflow
Timeliness Integrity
(from functional model execution)
Component Model Assumptions Hold
JBI intrusion detection requirements
PA1: Client-Core
Communication I & C
PA2: Alternate Path
Availability
QA1: QIS Incorruptibility
QA2: QIS Communication
Cutoff
QA3: QIS Input
Integrity
QA4: QIS Function
Correctness
AA1: AP Function
Correctness
AA2: AP Application-
layer Integrity
AA3: AP Application-layer
Confidentiality
DA1: DC Communications
SA1: IO Integrity in
PSQ Server
SA2: Client Confidentiality in PSQ Server
SA3: IO Authenticity
SA4: Network-layer I & C
SeA1: Sensor False Alarm
Rate
SeA2: Sensor Detection Delay
SeA3: Sensor Detection Probability
CoA1: Corrleator
False Alarm Rate
MA1: SM Byzantine Agreement
PsA1: ADF Policy Server
Input Correctness
PsA2: ADF Policy Server
SynchronizationSystem Connectivity
Physical Topology
Network TopologyRestricted RoutingNo Tunneling Attacks
SELinux Solaris Windows
Type Enforcement Hardened Kernel IKENA StormWatch
Platform Mechanisms Process Domain Policies
Private Key Confidentiality
No Unauthorized Direct Access
Keys Protected from Theft
DoD Common Access Card (CAC)
PKCS #11 Tamperproof
Keys Not Guessable
Algorithmic Framework
Key Length Key Lifetime
No Unauthorized Indirect Access
Physical Protection of CAC device
Protection of CAC Authentication Data
No Compromise of Authorized Process
Accessing CAC
No Cryptography in Access Proxy
Not Preconfigured
Not Reconfigurable
ADF NIC services protected
ADF Correctness
ADF NIC Physical Security
ADF NIC Firmware Initialization
ADF Key Initialization
ADF Agent Initialization
ADF Protocol Correctness
ADF Host Independence
ADF Agent Correctness
VPG Integrity VPG Confidentiality
Policy Server Integrity
ADF Policy Correctness
Correctness of Registration
Protocol
Correctness of Reattachment
Protocol
Hard-wired Configuration
Electrically Isolated
Physically Protected
Connectivity
Physical Integrity
Electrical Integrity
Gate Configuration and
Truth Table
Proxy Protocol Configuration
Can Identify Malformed Traffic
Correctness of Rate Control Mechanisms
Correctness of Certificate Exchange
IDS Experimental Evaluation
Correctness of Modified ITUA Protocols
Functional model faithful to design
IDS / Correlation requirements
IDS / Correlation requirements
IO Confidentiality (end-to-end)
IConfidentiality of Network
Communications
Confidential info is not exposed
Confidential info is not exposed
Unauthorized activity is properly rejected
Unauthorized activity is properly rejected
Authorized join/leave is processed successfully
Authorized join/leave is processed successfully
Authorized query is processed
successfully
Authorized query is processed
successfully
Authorized subscribe is processed successfullyAuthorized subscribe is processed successfully
JBI is properly initialized
JBI is properly initialized
Design Team Review
Attack Model Assumptions Hold
Functional Model Assumptions Hold
Infrastructure Attack
Propagation
Data Attack Propagation
Attacks Originate
Outside the Platform
No Data Attacks
Outside the Platform
Initial Targets of
Infrastructure Attacks
Isolation of Intruded Process Domains
Targets for Loss of IO
Confidentiality
No Compromise or Failure of
QIS
DoS Causes Processing
Delays
DoS Does Not Corrupt
Other Components
DoS Attacks Do Not
Propagate from Clients to Core
Design Faithfully
Implemented
Absence of Insider Threat
Attack Model Parameter Selection
CERT Vulnerability DB Analysis
Variation over Anticipated
Ranges
Correctness of Managed Switch
IO Confidentiality in Transit
IO Confidentiality in Storage
IO Confidentiality in Storage
Confidentiality of Application-layer
Messages
Confidentiality of Application-layer
Messages
Requirements decomposition
Executable model of the system (probabilistic or logical)
Model assumptions
Supporting arguments and experimentation
Fraction of successful publishes versus MTTD_A (min)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
10 100 1000 10000 100000
MTTD_A (min)
Fra
cti
on
of
Su
ccessfu
l P
ub
lish
es
12 hour mission 24 hour mission 48 hour mission
• Survivability results obtained through modeling– Critical functionality available with high probability
even when under heavy successful attack– 98% of all functions successful even with
vulnerabilities discovered daily, or faster– Operating system diversity bolsters reliability of
critical functionality when under attack– With the current architecture, attackers are more
effective compromising functionality than crashing components
Total number of intrusions versus MTTD_A (min)
0
100
200
300
400
500
600
10 100 1000 10000
MTTD_A (min)
To
tal
Nu
mb
er
of
Intr
usio
ns
12 hour mission 24 hour mission 48 hour mission