ensuring autonomous vehicle perception safetykoopman/talks/1904_perceptionsafety.pdf · 2...

EnsuringAutonomous Vehicle

Perception SafetyAutonomous Think Tank

Gothenburg, SwedenApril, 2019

© 2019 Edge Case Research

Prof. Philip Koopman

@PhilKoopman

2© 2019 Edge Case Research

Perception safety approaches Capturing Edge Cases

as the limiting factor Heavy Tail Distribution

Perception stress testing Accelerate building the

triggering event “zoo”

Overview of hologram tool Significant reduction in required validation data

Overview

[General Motors]


Good for identifying “easy” cases Expensive and potentially dangerous

Brute Force AV Validation: Public Road Testing

http://bit.ly/2toadfa


Validation Via Brute Force Road Testing? If 100M miles/critical mishap… Test 3x–10x longer than mishap rate Need 1 Billion miles of testing

That’s ~25 round tripson every road in the world With fewer than 10 critical mishaps…


Safer, but expensive Not scalable Only tests things you have thought of!

Closed Course Testing

Volvo / Motor Trend


Highly scalable; less expensive Scalable; need to manage fidelity vs. cost Only tests things you have thought of!

Simulation

http://bit.ly/2K5pQCN

Udacity ANSYS


Track arrival rate and types of “surprises” and fix them

Perception Safety Argument Approach


You should expect theextreme, weird, unusual Unusual road obstacles Extreme weather Strange behaviors

Edge Case are surprises You won’t see these in testing

Edge cases are the stuff you didn’t think of!

What About Edge Cases?

https://www.clarifai.com/demo

http://bit.ly/2In4rzj


Novel objects (missing from zoo) are triggering events

Need An Edge Case “Zoo”

http://bit.ly/2top1KDhttp://bit.ly/2tvCCPK

https://goo.gl/J3SSyu


Where will you be after 1 Billion miles of validation testing?

Assume 1 Million miles between unsafe “surprises” Example #1:

100 “surprises” @ 100M miles / surprise– All surprises seen about 10 times during testing– With luck, all bugs are fixed

Example #2: 100,000 “surprises” @ 100B miles / surprise– Only 1% of surprises seen during 1B mile testing– Bug fixes give no real improvement (1.01M miles / surprise)

Why Edge Cases Matter

https://goo.gl/3dzguf


Real World: Heavy Tail Distribution(?)

Common ThingsSeen In Testing

Edge CasesNot Seen In Testing

(Heavy Tail Distribution)


Sensor data corruption experiments

ML Is Brittle To Environment Changes

Synthetic Equipment Faults

Gaussian blur

Exploring the response of a DNN to environmentalperturbations from “Robustness Testing forPerception Systems,” RIOT Project, NREC, DIST-A.

Defocus & haze area significant issue

Gaussian Blur &Gaussian Noise cause

similar failures


False negative whenin front of dark vehicle

False negative whenperson next to light pole

Stress Testing PerceptionAugmenting images with noise highlights perception issues Identifies systemic weaknesses even in absence of noise

False positive on lane markingFalse negative real bicyclist


Mask-R CNN: examples of systemic problems

Example Triggering Events via Hologram

“Red objects”

Notes: These are baseline, un-augmented images.(Your mileage may vary on your own trained neural network.)

“Columns”

“Camouflage”

“Sun glare”

“Bare legs”

“Children”

“Single Lane Control”


Brittle perception behavior indicates Edge Cases Data augmentation reveals triggering events

Hologram Detects Edge Cases

CustomerData Lake

Scalable: Can Run InCustomer Data Center


Disengagements only catch near-collision surprises

Hologram Goes Beyond Disengagements


Hologram Multiplies Road Data Value


Holofactor = # Hologram detections per disengagement Reduced road data collection to identify triggering events Detection of rare triggering events missed by disengagements

Potential safety argumentation:1. Holofactor validated by comparing to disengagement data

– Hologram detections predict disengagements before they happen2. Arrival rate of dangerous surprises is acceptable

– Credit for safe failure fraction (non-dangerous surprises)– Credit for system ability to detect and react to unknowns– Defense-in-depth credit for vehicle safety features (e.g., AEB, airbags)

Safety Argumentation Strategies


1. Reduces amount of data collection required Detects non-disengagement hazards

2. Identifies SOTIF triggering events E.g., novel pedestrian categories

3. Identifies holes in training data Unstable classification under noise means gaps

4. Identifies “unknown unknown” objects & situations Especially context-dependent weakness

5. Works with unlabeled data Automatically identifies perception failures for human review

Hologram Benefits

22

Did not notice the drivernext to parked car


More safety transparency Independent safety assessments Industry collaboration on safety

Minimum performance standards Share data on scenarios and obstacles Safety for on-road testing (driver & vehicle)

Autonomy software safety standards Traditional software safety … PLUS … Hologram for accelerated perception testing UL 4600 Autonomous Vehicle Safety Standard

Ways To Improve AV Safety

http://bit.ly/2MTbT8F (sign modified)

Mars

Thanks!

ensuring autonomous vehicle perception safetykoopman/talks/1904_perceptionsafety.pdf · 2...

Documents