ensuring autonomous vehicle perception safetykoopman/talks/1904_perceptionsafety.pdf · 2...
TRANSCRIPT
EnsuringAutonomous Vehicle
Perception SafetyAutonomous Think Tank
Gothenburg, SwedenApril, 2019
© 2019 Edge Case Research
Prof. Philip Koopman
@PhilKoopman
2© 2019 Edge Case Research
Perception safety approaches Capturing Edge Cases
as the limiting factor Heavy Tail Distribution
Perception stress testing Accelerate building the
triggering event “zoo”
Overview of hologram tool Significant reduction in required validation data
Overview
[General Motors]
3© 2019 Edge Case Research
Good for identifying “easy” cases Expensive and potentially dangerous
Brute Force AV Validation: Public Road Testing
http://bit.ly/2toadfa
4© 2019 Edge Case Research
Validation Via Brute Force Road Testing? If 100M miles/critical mishap… Test 3x–10x longer than mishap rate Need 1 Billion miles of testing
That’s ~25 round tripson every road in the world With fewer than 10 critical mishaps…
5© 2019 Edge Case Research
Safer, but expensive Not scalable Only tests things you have thought of!
Closed Course Testing
Volvo / Motor Trend
6© 2019 Edge Case Research
Highly scalable; less expensive Scalable; need to manage fidelity vs. cost Only tests things you have thought of!
Simulation
http://bit.ly/2K5pQCN
Udacity ANSYS
7© 2019 Edge Case Research
Track arrival rate and types of “surprises” and fix them
Perception Safety Argument Approach
8© 2019 Edge Case Research
You should expect theextreme, weird, unusual Unusual road obstacles Extreme weather Strange behaviors
Edge Case are surprises You won’t see these in testing
Edge cases are the stuff you didn’t think of!
What About Edge Cases?
https://www.clarifai.com/demo
http://bit.ly/2In4rzj
9© 2019 Edge Case Research
Novel objects (missing from zoo) are triggering events
Need An Edge Case “Zoo”
http://bit.ly/2top1KDhttp://bit.ly/2tvCCPK
https://goo.gl/J3SSyu
10© 2019 Edge Case Research
Where will you be after 1 Billion miles of validation testing?
Assume 1 Million miles between unsafe “surprises” Example #1:
100 “surprises” @ 100M miles / surprise– All surprises seen about 10 times during testing– With luck, all bugs are fixed
Example #2: 100,000 “surprises” @ 100B miles / surprise– Only 1% of surprises seen during 1B mile testing– Bug fixes give no real improvement (1.01M miles / surprise)
Why Edge Cases Matter
https://goo.gl/3dzguf
11© 2019 Edge Case Research
Real World: Heavy Tail Distribution(?)
Common ThingsSeen In Testing
Edge CasesNot Seen In Testing
(Heavy Tail Distribution)
12© 2019 Edge Case Research
Sensor data corruption experiments
ML Is Brittle To Environment Changes
Synthetic Equipment Faults
Gaussian blur
Exploring the response of a DNN to environmentalperturbations from “Robustness Testing forPerception Systems,” RIOT Project, NREC, DIST-A.
Defocus & haze area significant issue
Gaussian Blur &Gaussian Noise cause
similar failures
13© 2019 Edge Case Research
False negative whenin front of dark vehicle
False negative whenperson next to light pole
Stress Testing PerceptionAugmenting images with noise highlights perception issues Identifies systemic weaknesses even in absence of noise
False positive on lane markingFalse negative real bicyclist
14© 2019 Edge Case Research
Mask-R CNN: examples of systemic problems
Example Triggering Events via Hologram
“Red objects”
Notes: These are baseline, un-augmented images.(Your mileage may vary on your own trained neural network.)
“Columns”
“Camouflage”
“Sun glare”
“Bare legs”
“Children”
“Single Lane Control”
15© 2019 Edge Case Research
Brittle perception behavior indicates Edge Cases Data augmentation reveals triggering events
Hologram Detects Edge Cases
CustomerData Lake
Scalable: Can Run InCustomer Data Center
16© 2019 Edge Case Research
Disengagements only catch near-collision surprises
Hologram Goes Beyond Disengagements
17© 2019 Edge Case Research
Hologram Multiplies Road Data Value
18© 2019 Edge Case Research
Holofactor = # Hologram detections per disengagement Reduced road data collection to identify triggering events Detection of rare triggering events missed by disengagements
Potential safety argumentation:1. Holofactor validated by comparing to disengagement data
– Hologram detections predict disengagements before they happen2. Arrival rate of dangerous surprises is acceptable
– Credit for safe failure fraction (non-dangerous surprises)– Credit for system ability to detect and react to unknowns– Defense-in-depth credit for vehicle safety features (e.g., AEB, airbags)
Safety Argumentation Strategies
19
20
21
22© 2019 Edge Case Research
1. Reduces amount of data collection required Detects non-disengagement hazards
2. Identifies SOTIF triggering events E.g., novel pedestrian categories
3. Identifies holes in training data Unstable classification under noise means gaps
4. Identifies “unknown unknown” objects & situations Especially context-dependent weakness
5. Works with unlabeled data Automatically identifies perception failures for human review
Hologram Benefits
22
Did not notice the drivernext to parked car
23© 2019 Edge Case Research
More safety transparency Independent safety assessments Industry collaboration on safety
Minimum performance standards Share data on scenarios and obstacles Safety for on-road testing (driver & vehicle)
Autonomy software safety standards Traditional software safety … PLUS … Hologram for accelerated perception testing UL 4600 Autonomous Vehicle Safety Standard
Ways To Improve AV Safety
http://bit.ly/2MTbT8F (sign modified)
Mars
Thanks!