challenges in applying ml-enabled systems in...

© Mauricio Castillo-Effen

Challenges in Applying ML-Enabled Systems in

AvionicsMauricio Castillo-Effen, Ph.D.

FoMLAS

Thessaloniki, Greece, April 20th, 2018


Overview

About the speaker

I. Definitions

II. Benefits of Learning-Enabled Avionics

III. Challenges

IV. Promising Directions


About the Speaker• Worked for the past 15+ years in “Autonomous Systems”

• Mostly, focused on aerial systems/avionics, air traffic management

• Gradual shift of focus from functional to non-functional aspects, i.e.: “trustworthiness”

• “The machine that makes the machine”—Systems Engineering

• Verification and Validation, Test and Evaluation & Agile Development

• Currently: focused on “Autonomy V&V”

• Not an expert in “Machine Learning”


I. Definitions


DefinitionsTrustworthiness (⇔ Certifiability)

Justified confidence that a system will perform as expected

Trust (⇔ Certification)

‣ Accepted dependence

‣ Implies assessment and issuance of a certificate


Definitions (cont’d)Dependability

Ability to avoid service failures that are more frequent and more severe than is acceptable

High Assurance

Functional correctness, Safety, Security

Resiliency

Ability to recover (rapidly) in the presence of failures


Definitions (cont’d)Correctness

Ability to deliver the intended functionality. For every input it delivers the expected output

Safety

Absence of catastrophic consequences on the user(s) and the environment.

Security

Confidentiality, Integrity, Availability


Definitions (cont’d)Learning-Enabled System

A system that incorporates one or more learning-enabled components

Learning-Enabled Component

One that acquires and updates its behavior through a “learning process”

Learning vs. Adaptation

Learning implies improvement (in contrast to adaptation)


Learning CategoriesWhen learning occurs

‣ Offline: at design time

‣ Online: during operation

Techniques (Source of “knowledge”)

‣ Supervised: learn from examples

‣ Reinforcement: learn by reward

‣ Unsupervised: make sense from data


Machine Learning“Loose” definition

“An approach to Artificial Intelligence through learning from experience to find patterns in a set of data”

Steps

‣ Training: Data gathering; Data preparation; Choose a model; Training; Evaluation; Parameter tuning

‣ Inference (Prediction): Deployment on target hardware


Criticality

Hardware Learning-Enabled Systems

Non-Critical

Mission-Critical

Safety-CriticalBo

eing 7

87

Alexa

Powe

r Turb

ine

UCAV

Auton

omou

s

Air Ta

xi

“EASY”

HARD

VERYHARD

Trustworthy Learning-Enabled Systems

Trustworthy Cyber Physical Systems

Complexity

Safety- and Mission-Critical Learning-Enabled Systems


–Antoine de Saint-Exupery

“If you want to build a ship, don't drum up people to collect wood and don't assign them tasks and

work, but rather teach them to long for the endless immensity of the sea.”

By Tiago Fioreze [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0) or GFDL (http://www.gnu.org/copyleft/fdl.html)], from Wikimedia Commons

II. Benefits


Benefits of Learning in Avionics

Assuming learning inspired trust:

Technical Benefits

✓Applicable to manned and unmanned platforms

✓Higher levels of autonomy not achievable through human-driven development (e.g.: coding). For example, ability to deal with uncertain, unstructured, and dynamic environments/situations

✓Ability to improve performance, safety, and security AFTER deployment

✓Faster development

Business Benefits

✓Reduced development costs

✓Aviation’s Achilles’ Heel: innovation = new technology + commercialization


Ex. 1: Autonomous Air Taxi Technical benefits enabled by learning:

✓GPS-free self-localization

✓Collision avoidance for safe navigation in congested airspace

✓Safe autonomous takeoff and landing

✓Safe control in the presence of adverse weather conditions

✓Resiliency in the presence of contingencies

Business benefits:

✓No pilot

✓Benefits to society associated with on-demand personal air transport

This file is licensed under the Creative Commons Attribution 2.0 Generic license.

Attribution: Alex Butterfield

https://en.wikipedia.org/wiki/en:Creative_Commonshttps://en.wikipedia.org/wiki/en:Creative_Commonshttps://creativecommons.org/licenses/by/2.0/deed.en


Ex. 2: UTM-Enabled Drones • Benefits enabled by learning:

✓Better ability to predict

trajectories (nominal and off-nominal) in a wide variety of conditions ➠ better coordination

✓Better perception and control in adverse atmospheric conditions

✓Better ability to handle contingencies


Ex. 3: “Loyal Wingmen”Technical benefits enabled by learning:

✓Naturalistic human/machine interfaces (e.g.: NLP)

✓Complex “playbook” mission planning/re-planning and management

✓Reduced cognitive load on mission commander

✓Improved situational awareness provided to mission commander (through better perception)

✓Safety for human crewFrom AFRL’s video “Air Force 2030 – Call to Action”


Dav

id D

emar

et [C

C B

Y-SA

3.0

(http

s://

crea

tivec

omm

ons.

org/

licen

ses/

by-s

a/3.

0)],

via

Wik

imed

ia C

omm

ons

“There’s no free lunch”

III. Challenges


SoI System Safety Group

DO-178 / DO-254 Processes

ARP 4754A Process

ARP 4761 Process

Avionics Development Methodology for Certification

Where does ML fit?ML


Guideline Documents


Requirements

• Do requirements translate to training data requirements or “scenarios”? => How to design them?

• How do we know that the quantity and quality of data are adequate?

• Harder to define functionality included in each learning-enabled component? (e.g. one ML-component for variety of situations)

Avionics ML

Statements that define operational, functional, or design characteristic or constraints. They must be unambiguous, testable, or measurable and necessary for product or process acceptability”

Functional and design characteristics are derived from data

Requirements development usually involves multiple iterations and use of successive decomposition and refinement

Iterations may translate to tuning, refinement, re-training, data collection

?


Design Safety• Well-known / well-documented techniques used in avionics: Functional Hazard Assessment,

Fault Tree Analysis, Failure Modes and Effect Analysis, Common Mode Analysis, etc.

• Design Assurance Levels (DAL): A: Catastrophic, B: Hazardous, C: Major, D: Minor

• Techniques work for known statistical properties of components (failure rates, exposure time, etc.)

• Model-Based Design:

• Techniques to capture design intent such that your design is analyzable—also in the presence of faults

• Model-based safety analysis: incorporate models of off-nominal behavior (e.g.: AADL’s Error Model Annex)

• How do we incorporate ML components that are inscrutable into MBD/safety analysis?

• What are failure modes and associated statistics??


Operational Safety• Guidelines: “Safety Assessment of Aircraft in Commercial Service” ARP

5150 / 5151

• Ongoing safety assessment

• Tools: Global Aviation Information Network (GAIN) , Flight Operation Quality Assurance Program (FOQA), Operator’s Flight Safety Handbook (OFSH)

• Operations for ML-enabled systems not established yet

• How to absorb information to achieve improvement in safety? (“learn how to learn”)

• How to encourage standardization while leaving room for competition

• How to protect proprietary information

?


V&V)Cost)

~80O90%)of)faults)introduced)here)

~96%)of)faults)found)here)

Verification• Software complexity is a major

challenge

Examples: flight management systems, adaptive control laws

• Verification costs higher than development costs

• State-of-practice: test-based verification

• Current efforts:

- Formal methods to introduce

tools and automation (DO-333 supplement to DO-178C)

- Compositional verification

Source: G. Brat


Verification in ML• State-of-practice: Data-Train-Test-Validate cycle

• BUT

• No requirements

• How much to test?

• If verification not satisfied => get more data, retrain?

• Quality and quantity of data for training, testing, and validation?

?


Hardware• Applicable standard for hardware components: RTCA DO-254/EUROCAE ED-80

“Design Assurance Guidance for Airborne Electronic Hardware”: only addresses standard CPUs, PLDs/FPGAs

• State-of-practice: GPUs used for Airborne Display Systems (Whitepaper: Certification Authority Software Team CAST 29 “Use of COTS Graphical Processors (GCP) in Airborne Display Systems”)

• Vendor with largest market share NVIDIA does not offer devices that can be used in avionics-type of applications. Example: Jetson TX2i : Temperature range-40C–85C vs required -55C–125C; Lifecycle: 10 yrs vs typical platform life in military 15–50 years

• GPUs are considered too complex for traditional verification (multiple processors running asynchronously)

• GPUs not designed to the guidance of DO-256/ED-80

• Rapid lifecycle -> design errors may still be there, obsolescence management?


Real Time• Real time is a foundation to safety-criticality

• Correct/predictable behavior means provable bounded Worst-Case-Execution-Time

• “Real time” for gaming and simulation is different from “real-time” in safety-critical application

• Guaranteeing low latency and real-time are NOT the focus of OpenCL/Cuda or GPUs

• OpenCL and Cuda reported to exhibit “hitches” (GPU resets)

?

© Mauricio Castillo-Effenhttps://www.flickr.com/photos/40943981@N00/3351710447

Attribution 2.0 Generic (CC BY 2.0)

IV. Promising Directions


■ Future surveillance environment: Both SESAR and NextGen make extensive use of

new surveillance sources, especially satellite-

based navigation and advanced ADS-B

functionality. TCAS however relies solely on

transponders on-board aircraft which will limit

its flexibility to incorporate these advances.

A number of solutions (such as hybrid

surveillance) have recently been introduced

to TCAS to begin addressing some of

the above. But adapting TCAS to the

requirements of the future ATM system

is likely to involve a complete and costly

overhaul. Instead, the FAA has chosen to

develop ACAS X.

How is ACAS X planned to differfrom TCAS II?Two of the key differences between TCAS II

and the current concept for ACAS X are the

collision avoidance logic and the sources of

surveillance data.

TCAS relies exclusively on interrogation

mechanisms using transponders on-board

2NETALERT Newsletter June 2013

ACAS X - the future of airborne collision avoidancecontinued

operational concepts which will reduce the

spacing between aircraft. TCAS II in its current

form is not compatible with such concepts

and would alert too frequently to be useful.

■ Extending collision avoidance to other classes of aircraft: To ensure advisories can be followed, TCAS II is restricted to categories

of aircraft capable of achieving specified

performance criteria (e.g. minimum rate

of climb of 2,500 feet per minute), which

excludes the likes of General Aviation (GA)

and Unmanned Aircraft Systems (UAS).

Offline developmentACAS X is based on a probabilistic model providing a statistical representation of the aircraft position in the future.

It also takes into account the safety and operational objectives

of the system enabling the logic to be tailored to particular

procedures or airspace configurations.

This is fed into an optimisation process called dynamic programming to determine the best course of action to follow

according to the context of the conflict. This takes account of a

rewards versus costs system to determine which action would

generate the greatest benefits (i.e. maintain a safe separation

while implementing a cost-effective avoidance manoeuvre).

Key metrics for operational suitability and pilot acceptability

include minimizing the frequency of alerts that result in

reversals/intentional intruder altitude crossings or disruptive

advisories in noncritical encounters.

Real-time operationThe lookup table is used in real-time on-board the aircraft to resolve conflicts. ACAS X collects surveillance measurements from an array of sources (approximately every second).

Various models are used (e.g. a probabilistic sensor model

accounting for sensor error characteristics) to estimate a state

distribution, which is a probability distribution over the current

positions and velocities of the aircraft. The state distribution determines where to look in the numeric lookup table to determine the best action to take. If deemed necessary,

resolution advisories are then issued to pilots.

Inside ACAS XACAS X collision avoidance logic is best explained in two distinct phases, offline development and real-time operation.

Offline development

Real-time implementation

Probablistic model

Optimisationprocess

Numeric lookup table

State distribution

Resolution advisories

Surveillance sensor measurement Inferred A/C

position estimate

ACAS X Example• “A new NextGen collision

avoidance system for aircraft has the potential to dramatically decrease unnecessary alerts by one third and cut collision risk in half.”

• Flight tests with prototype and extensive simulations by the FAA

• Standard to be formalized in 2018

• Flight evaluations 2019 Netalert Eurocontrol Mag June 2013


of input samples, and there may exist other inputs for which a wrong advisoryis produced, possibly leading to collision. Therefore, we used Reluplex to proveproperties from the following categories on the DNNs: (i) The system does notgive unnecessary turning advisories; (ii) Alerting regions are uniform and donot contain inconsistent alerts; and (iii) Strong alerts do not appear for high ⌧values.

COC

WL

SL

SR

WR

�5 0 5 10 15

�5

0

5

Downrange (kft)

Crossrange

(kft)

Fig. 7: Advisories for a head-on encounter with aprev = COC, ⌧ = 0 s.

6 Evaluation

We used a proof-of-concept implementation of Reluplex to check realistic prop-erties on the 45 ACAS Xu DNNs. Our implementation consists of three mainlogical components: (i) A simplex engine for providing core functionality such astableau representation and pivot and update operations; (ii) A Reluplex enginefor driving the search and performing bound derivation, ReLU pivots and ReLUupdates; and (iii) A simple SMT core for providing splitting-on-demand services.For the simplex engine we used the GLPK open-source LP solver3 with somemodifications, for instance in order to allow the Reluplex core to perform boundtightening on tableau equations calculated by GLPK. Our implementation, to-gether with the experiments described in this section, is available online [14].

Our search strategy was to repeatedly fix any out-of-bounds violations first,and only then correct any violated ReLU constraints (possibly introducing newout-of-bounds violations). We performed bound tightening on the entering vari-able after every pivot operation, and performed a more thorough bound tight-ening on all the equations in the tableau once every few thousand pivot steps.Tighter bound derivation proved extremely useful, and we often observed thatafter splitting on about 10% of the ReLU variables it led to the elimination of allremaining ReLUs. We counted the number of times a ReLU pair was fixed viaUpdateb or Updatef or pivoted via PivotForRelu, and split only when this numberreached 5 (a number empirically determined to work well). We also implementedconflict analysis and back-jumping. Finally, we checked the accumulated round-o↵ error (due to the use of double-precision floating point arithmetic) after every

3www.gnu.org/software/glpk/

ACAS XU Verification• ACAS X requires 2GB of memory

• Deep Neural Network can replace

ACAS X decision software with just 3MB memory footprint

• Challenge: Verification of DNN properties: (i) The system does not give unnecessary turning advisories; (ii) Alerting regions are uniform and do not contain inconsistent alerts; and (iii) Strong alerts do not appear for high τ values.

• Tool used: Reluplex From Katz et al. “Reluplex: An Efficient SMT Solver… “

A DNN implementation of ACAS Xu presents new certification challenges.Proving that a set of inputs cannot produce an erroneous alert is paramountfor certifying the system for use in safety-critical settings. Previous certificationmethodologies included exhaustively testing the system in 1.5 million simulatedencounters [20], but this is insu�cient for proving that faulty behaviors do notexist within the continuous DNNs. This highlights the need for verifying DNNsand makes the ACAS Xu DNNs prime candidates on which to apply Reluplex.

Network Functionality. The ACAS Xu system maps input variables to actionadvisories. Each advisory is assigned a score, with the lowest score correspondingto the best action. The input state is composed of seven dimensions (shown inFig. 6) which represent information determined from sensor measurements [19]:(i) ⇢: Distance from ownship to intruder; (ii) ✓: Angle to intruder relative toownship heading direction; (iii) : Heading angle of intruder relative to ownshipheading direction; (iv) vown: Speed of ownship; (v) vint: Speed of intruder; (vi) ⌧ :Time until loss of vertical separation; and (vii) aprev: Previous advisory. Thereare five outputs which represent the di↵erent horizontal advisories that can begiven to the ownship: Clear-of-Conflict (COC), weak right, strong right, weakleft, or strong left. Weak and strong mean heading rates of 1.5 �/s and 3.0 �/s,respectively.

Ownship

vown

Intruder

vint

⇢

✓

Fig. 6: Geometry for ACAS Xu Horizontal Logic Table

The array of 45 DNNs was produced by discretizing ⌧ and aprev, and produc-ing a network for each discretized combination. Each of these networks thus hasfive inputs (one for each of the other dimensions) and five outputs. The DNNsare fully connected, use ReLU activation functions, and have 6 hidden layerswith a total of 300 ReLU nodes each.

Network Properties. It is desirable to verify that the ACAS Xu networksassign correct scores to the output advisories in various input domains. Fig. 7illustrates this kind of property by showing a top-down view of a head-on en-counter scenario, in which each pixel is colored to represent the best action ifthe intruder were at that location. We expect the DNN’s advisories to be con-sistent in each of these regions; however, Fig. 7 was generated from a finite set

τ: Time until loss of vertical separation


Safety)for)Autonomy)

•  NASA’s)AdvoCATE)tool)provided)an)effec3ve)means)for)genera3ng)the)safety)case)required)for)obtaining)a)Cer3ficate)of)Authoriza3on)(COA))for)an)autonomous)opera3on)•  NASA)Mizopex)Mission)(FY14))

•  Enabled)NASA)team)to)get)COA)approval)in)less)than)3)weeks)and)meet)the)mission)launch)window,)which)was)closing)

•  NASA)team)now)helping)with)COAs)for)UTM)(FY16))

•  Expanding)beyond)classic)GSN)approach)

Assurance Case Approach to Certification

• “A documented body of evidence that provides a convincing and valid argument that a specified set of critical claims regarding a system's properties are adequately justified for a given application in a given environment” Scott and Krombolz (2005)

• Breaks dependence on specific artifacts and processes for certification, opening up possibilities for formulating alternative strategies and forms of evidence

• Tooling for development, maintenance, and query of assurance cases

• Example tool: AdvoCATE (Assurance Case Automation Toolset)

E. Denney et al.


Autonomous LE-CPS

Program Structure

Actuators

Plant

Sensors

CL: claimE: evidenceE’: conditional evidence

Assurance Monitors &

Guards

New System Models New Formal Verification

New Simulation based Testing

New SystemTesting

E’

Dynamic Assurance

Design TimeOperation Time Implementation

LEC LEC

E’

New Assurance Case

CLCL

CL

CL

CL

E’

E’

E

CLCL

CL

CL

E’

E’Controller

Autonomy Components

Env. Goals

Safety aware learning

Derived and Linked

TA1: Design for Assurance

TA2: Assurance Monitoring and Control

TA3: Dynamic Assurance

C: componentLEC: learning-enabled component

C CC C C

C C C

Assurance Measure

Distribution Statement “A” (Approved for Public Release, Distribution Unlimited) 9

DARPA’s Assured Autonomy Program


48 IEEE CONTROL SYSTEMS MAGAZINE » DECEMBER 2016

system can be falsified—that is, whether there exists a p P! and u U! such that ( , , ) .p uM t }U Y An important consequence of falsification is that a specific p P! and u U! that demonstrates that ( , , )p uM t }U Y is identified. This parameter and input provide the user with valuable information that can be used to debug the design.

All testing and verification approaches rely on some form of requirements, either formal or informal, but the process of creating correct and useful requirements is an often underappreciated activity. Care should be taken to create requirements that accurately reflect the intended behavior of the system.

Definition 4 (Requirement Engineering)Requirement engineering is the process of developing an appropriate } .

Requirement engineering remains a challenge for industry. Embedded control developers in many domains have made significant efforts to generate and document clear and concise requirements; however, challenges re -main due to 1) the incompatibility between the form of the documented requirements and the input to existing veri-fication and testing tools, 2) the ambiguous nature of requirements captured in natural language, 3) potential inconsistencies between requirements, and 4) the large number of requirements.

QUALITY CHECKING FOR EMBEDDED CONTROL SYSTEMSThis section presents an overview of modeling and simula-tion techniques currently used in industry. Generally speak-ing, modeling is the process of developing an appropriate

Spectrum of Analysis Techniques

Many types of analyses can be per-formed on embedded control sys-tem designs. Each analysis approach has unique benefits and shortcomings, and each applies to a specific class of system representations.

Consider the spectrum of analysis techniques presented in Figure S1, which provides a subjective classification of var-ious analysis approaches, based on the degree of exhaustiveness of the approach and the scale of the model to which the approach can be applied. Here, exhaus-tiveness refers to how well the approach accounts for all possible behaviors of a model. The exhaustiveness is indicated by the horizontal position of each ap-proach (left is less exhaustive and right is more exhaustive). The scale of each model refers to the level of detail and size of the models that can effectively be ad-dressed by each approach. The scale is indicated by the vertical position of each approach (lower is smaller scale and higher is larger scale).

The analysis techniques on the far left side of Figure S1 are classified as “testing/control techniques,” since they are based on individual (finite) sets of behaviors of the system model or provide information about only local behaviors. The analysis techniques on the right side fall under the classification of “ver-ification” techniques, since they account for all behaviors of the system models.

Consider the simulation item in Figure S1, which is intended to refer to approaches that use simulations based on operating

conditions that are either manually selected or are selected using a Monte Carlo method. This item is located at the top-left of the spectrum because it can be performed for models of any scale but provides only one example of the system behavior. Therefore, simulation scales well, but it does not provide ex-haustive results.

Two different types of linear analysis appear on the spec-trum, numerical and symbolic. Here, linear analysis refers to the process of applying Lyapunov’s indirect (first) method to

Less Formal/Exhaustive More Formal/Exhaustive

Less

Sca

labl

eM

ore

Sca

labl

e

Testing/Control Techniques Verification

(Numerical)

Test Vector Generation for Model Coverage

alsification

T

Testing

ov s

Proofs

CheckingTheorem

Proving

F IGURE S1 The spectrum of analysis techniques. For various types of analyses, the spectrum illustrates how thoroughly each one accounts for system behaviors and the level of complexity of the models that can be considered.

Use of High Fidelity SimulationSpectrum of Analysis Techniques

Kapi

nski

et a

l. C

ontro

l Sys

Mag

Dec

201

6


Use of High Fidelity Simulation

• Use by almost all major players in automated driving to “collect miles” but also to bootstrap learning

• Valid if simulation exposes real emergent properties not easily captured by models developed by hand

• How do we generate scenarios that are feasible and that provide “value”?

• When do we move from simulation to the physical world?

• What is the contribution of each form of evidence?

Example: Microsoft’s Airsim (github)


10

TA 2: Drive resilient design

A design-with-verification paradigm that guides

the designer to resilient system designs

Challenges:• Provide rigorous assurance of meeting

requirements on complex systems

• Apply inherently resilient, function-preserving design patterns

• Reduce the sensitivity of a design to legacy functionality

Approach:• Provide scalable formal methods tools

o Formal proofs that a system design meets cyber requirements

o Generated tests to validate design model against physical systems

• Develop a library of resiliency supporting design patterns

• Develop tools to specify and generate high assurance cyber monitors

Design

for

resiliency

Validation TestsCyber

req’ts

A

B

O

High-Level Design

A

BO

Design to requirements, verify and validate

Derived Component Cyber Requirements

Resilient Design

A

B

O

Formal methods

Cyber Requirements

1. Shall not…2. Shall not…3. Shall not…

Distribution A. Approved for public release: distribution unlimited.Resilient Architectures

DARPA Cyber Assured Systems Engineering


Resilient Architectures

• Are there architectural patterns that allow for “hardening” of systems incorporating ML-enabled components? Examples: redundancy, voting schemes, runtime monitoring with recovery

• Can we develop tools that can assist with exploration of the design space of fault-tolerant architectures?


Conclusions• Introduction of ML to avionics requires marriage of two different cultures:

innovation/market-driven vs. safety-driven

• There is significant effort in improving efficiency of Verification and Validation in the aviation domain. ML introduces additional challenges/opportunities

• Given the data-centric nature of ML techniques, metrics are needed

• Choice of when to use ML vs. “traditional” design techniques still left to engineering intuition

• Standardization has helped in avionics, what are the standards related to data, ML models and techniques?

• High fidelity simulation is promising but we need to study the limits of its validity

• Assurance case could represent a viable option for certification


Thanks!

Questions?