autocalib: automatic calibration of traffic cameras at...
TRANSCRIPT
AutoCalib: Automatic Calibration of Traffic Cameras at Scale
Romil Bhardwajโ , Gopi Krishna Tummala*, Ganesan Ramalingamโ , Ramachandran Ramjeeโ , Prasun Sinha*
โ Microsoft Research, *The Ohio State University
50
150
250
350
450
2012 2013 2014 2015 2016
Nu
mb
er
of
Cam
era
s (M
illio
n)
Number of Security Cameras Worldwide
Source: IHS
Conventional Traffic Camera Uses
Post-facto Incident ReviewManual Surveillance
Emerging Traffic Camera Use Cases
Vehicle Speed Measurement(without dedicated sensors)
Traffic Analytics Near Miss Stats
All require distance measurements in the scene
Measuring Distances in an Image
220 px = 8 m
220 px = 34 m
Camera CalibrationReal-world Coordinates (m) <-> Image Coordinates (px)
Camera Calibration
๐ฆ =๐๐ฅ 0 ๐๐ฅ0 ๐๐ฆ ๐๐ฆ0 0 1
๐11 ๐12 ๐13 ๐ก1๐21 ๐22 ๐23 ๐ก2๐31 ๐32 ๐33 ๐ก3
๐ฅ
Intrinsic Matrix(Focal length, camera center)
Extrinsic Matrix(Rotation, Translation)
ImageCoordinates
Real WorldCoordinates
๐
๐
โHardโ Calibration
๐ฆ =๐๐ฅ 0 ๐๐ฅ0 ๐๐ฆ ๐๐ฆ0 0 1
๐11 ๐12 ๐13 ๐ก1๐21 ๐22 ๐23 ๐ก2๐31 ๐32 ๐33 ๐ก3
๐ฅ
Intrinsic Matrix(Focal length, camera center)
Extrinsic Matrix(Rotation, Translation)
ImageCoordinates
Real WorldCoordinates
Not Scalable!
โSoftโ Calibration
๐ฆ =๐๐ฅ 0 ๐๐ฅ0 ๐๐ฆ ๐๐ฆ0 0 1
๐11 ๐12 ๐13 ๐ก1๐21 ๐22 ๐23 ๐ก2๐31 ๐32 ๐33 ๐ก3
๐ฅ
Intrinsic Matrix(Focal length, camera center)
Extrinsic Matrix(Rotation, Translation)
ImageCoordinates
Real WorldCoordinates
โEPnP Solver
โSoftโ Calibration - Prior Art
Chessboard Calibration
Vanishing Points
Geometric Landmarks
No Chessboard Patternsin Traffic Views
Assumption ofStraight Line Motion
Assumption ofLandmarks
AutoCalib Overview
AutoCalib๐
๐
Traffic Video Calibration Estimate
AutoCalib: no humans-in-the-loop, robust calibration
Video FramesVehicle
DetectionKeypoint
Extraction
Calibrations Set
Vehicle Geometric Dimensions
Calibration
Geometry based filters
Calibration Values
Cropped Image Vehicle Keypoints
๐๐ถ
๐๐ถ๐๐ถ
๐๐บ
๐๐บ
๐๐บ
๐ , ๐๐น๐, ๐ป๐๐น๐, ๐ป๐
:
AutoCalib - Pipeline
Vehicle Detection
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Vehicle Detection
โข Off-the-shelf DNNs (Fast-RCNN, YOLO) promise state of the art accuracyโข Expensive, scene often empty
โข Background Subtraction is fastโข Inaccurate
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Solution - Trigger the DNN with Background Subtraction
Key-point Extraction
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Key-point Selection
Desired Properties
1. Visually Distinct
โข Ease of detection
2. Non-planar
โข Robust Calibrations
vs
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Key-point Extraction
โข Statistical vision based techniques arenโt robust to lighting variations
โข DNNs require a lot of labelled dataโข No datasets available
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Transfer learn a DNN on a smaller dataset
Transfer Learning - Primer
Convolution and Pooling Layers(Generic Features)
Fully Connected Layers(Car Model Classification)
Output:BMW 3 Series
Transfer Learning - Primer
Convolution and Pooling Layers(Generic Features)
Fully Connected Layers(now detecting key-points)
Output:Key-points (x,y)
Transfer Learning - Less Data, Faster Training
Key-point DNN Dataset
โข Manually labelled key-points on 486 car images
โข Image Augmentation
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Original Img Horz Mirror Horz Mirror Rotate
Horz Mirror Crop
Original Crop
Original Rotate
Total of 10,344 images post augmentation
Key-point DNN Trainingโข GoogLeNet architecture trained on CUHK CompCars dataset (CVPR โ15)
for Car make/model classification
โข Replaced last two fully connected layers with keypoint regression outputs
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Key-point DNN Performance
~80% of Key-points < 10% error
Calibration Estimation
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
๐ฆ =๐๐ฅ 0 ๐๐ฅ0 ๐๐ฆ ๐๐ฆ0 0 1
๐11 ๐12 ๐13 ๐ก1๐21 ๐22 ๐23 ๐ก2๐31 ๐32 ๐33 ๐ก3
๐ฅ
Intrinsic Matrix(Focal length, camera center)
Extrinsic Matrix(Rotation, Translation)
ImageCoordinates
Real WorldCoordinates
Vehicle Identification at low resolutionโฆ
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
โฆ is hard!(for both, humans and machines)
Canโt identifyโฆ so, approximate!
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
R1, T1
R3, T3
R2, T2
Calibrate
n Modelsn Calibrations
(Toyota Prius, Toyota Corolla, Honda Civic, Volkswagen Jetta, BMW 320i, Audi A4, etc.)
Calibrate with most popular cars
Errors in Calibration
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Key-point Prediction ErrorsModel Approximation Errors
Statistical filters to remove outliers and average
Key Insight 1
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Ground plane should be consistent across all Calibrations
The Orientation Filter
1. For calibration ๐ ๐ , ๐๐ , its Z-axis orientation ิฆ๐ง
is defined by vector ๐ โ,3๐
2. Let ิฆ๐ง๐๐ฃ๐ = ๐ด๐ฃ๐๐๐๐๐(๐ โ,3๐ )
3. Pick ๐% calibrations with the least deviation
between ิฆ๐ง and ิฆ๐ง๐๐ฃ๐
๐ 1, ๐1
๐ 2, ๐2
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Key Insight 2
Distance to a fixed point must be consistent across Calibrations
๐
๐
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
The Displacement Filter
โข Focus region: Region where cars are detected
โข For each Calibration:
1. Point ๐๐ = projection of center of focus region on the ground plane
using (๐ ๐ , ๐๐)
2. ๐๐ = Distance of ๐๐ to camera
โข Pick middle ๐% and filter the rest
๐๐
๐๐
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Filtering Overview
Orientation Filter (75%)
Displacement Filter (50%)
Average Rotation Matrix
Orientation Filter (75%)
(๐ ๐๐๐๐๐ , ๐๐๐๐๐๐)
(๐ 1, ๐1) (๐ 2, ๐2) (๐ 3, ๐3)
โฆ . .
Displacement Filter (Pick median)
(๐ ๐๐ฃ๐, ๐1) (๐ ๐๐ฃ๐, ๐
2) โฆ . .
Video Frames Vehicle DetectionKeypoint
ExtractionCalibrations SetCalibration
Geometry based filters
Calibration Values
Implementation
Azure Service โ 4 Tesla K80s, 224 GB RAM
< 12% error with ~8 minutes of video
Evaluation - Dataset
โข 350+ hours from 10 traffic cameras in
Seattle
โข Resolution - 640x360 to 1280x720
โข Ground truth distances and calibration
estimated using Google Earth
A
B D
EF G
Camera Image
A
B D
EF G
8m
8m
12m9m
Google Earth View
Evaluation
AutoCalib vs Manual Calibration
4.8 5.3 5.1 5.5
8.2
1.83.0
5.1
1.5
5.9
9.8
12.3
7.9
10.6 11.1
6.7
10.211.1
5.1 5.1
0
4
8
12
16
20
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
RM
S E
rror
(%)
Ground Distance Measurement, RMS Error (%)
Manual Calibration AutoCalib EstimateAutoCalib achieves <12% RMS error in measuring distances
AutoCalib vs Prior Art
9.812.3
7.910.6 11.1
6.7 10.2 11.1
5.15.1
16.8 14.920.3
28.8
15.8
5.4
23.019.4
14.7
56.8
0
10
20
30
40
50
60
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
RM
S E
rro
r (%
)
Ground Distance Measurement, RMS Error (%)
AutoCalib Calibration VP Approach [1]
[1] Dubskรก et al., Fully automatic Roadside Camera Calibration for Traffic Surveillance. IEEE ITS 2015
AutoCalib outperforms prior state of the art approaches
Does more video data help?
AutoCalib converges with increasing vehicle detections
Application โ Speed Measurement
AutoCalib Summary
โข Camera Calibration
โข Enables distance measurements
โข Highly manual today
โข AutoCalib
โข Scalable automatic calibration
โข Uses DNNs to analyze vehicle geometry
โข Experiments
โข < 12% error in measuring distances
โข Calibrates with few hundred detections