autocalib: automatic calibration of traffic cameras at...

Post on 16-Sep-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AutoCalib: Automatic Calibration of Traffic Cameras at Scale

Romil Bhardwaj†, Gopi Krishna Tummala*, Ganesan Ramalingam†, Ramachandran Ramjee†, Prasun Sinha*

†Microsoft Research, *The Ohio State University

50

150

250

350

450

2012 2013 2014 2015 2016

Nu

mb

er

of

Cam

era

s (M

illio

n)

Number of Security Cameras Worldwide

Source: IHS

Conventional Traffic Camera Uses

Post-facto Incident ReviewManual Surveillance

Emerging Traffic Camera Use Cases

Vehicle Speed Measurement(without dedicated sensors)

Traffic Analytics Near Miss Stats

All require distance measurements in the scene

Measuring Distances in an Image

220 px = 8 m

220 px = 34 m

Camera CalibrationReal-world Coordinates (m) <-> Image Coordinates (px)

Camera Calibration

𝑦 =𝑓π‘₯ 0 𝑐π‘₯0 𝑓𝑦 𝑐𝑦0 0 1

π‘Ÿ11 π‘Ÿ12 π‘Ÿ13 𝑑1π‘Ÿ21 π‘Ÿ22 π‘Ÿ23 𝑑2π‘Ÿ31 π‘Ÿ32 π‘Ÿ33 𝑑3

π‘₯

Intrinsic Matrix(Focal length, camera center)

Extrinsic Matrix(Rotation, Translation)

ImageCoordinates

Real WorldCoordinates

𝑇

𝑅

β€œHard” Calibration

𝑦 =𝑓π‘₯ 0 𝑐π‘₯0 𝑓𝑦 𝑐𝑦0 0 1

π‘Ÿ11 π‘Ÿ12 π‘Ÿ13 𝑑1π‘Ÿ21 π‘Ÿ22 π‘Ÿ23 𝑑2π‘Ÿ31 π‘Ÿ32 π‘Ÿ33 𝑑3

π‘₯

Intrinsic Matrix(Focal length, camera center)

Extrinsic Matrix(Rotation, Translation)

ImageCoordinates

Real WorldCoordinates

Not Scalable!

β€œSoft” Calibration

𝑦 =𝑓π‘₯ 0 𝑐π‘₯0 𝑓𝑦 𝑐𝑦0 0 1

π‘Ÿ11 π‘Ÿ12 π‘Ÿ13 𝑑1π‘Ÿ21 π‘Ÿ22 π‘Ÿ23 𝑑2π‘Ÿ31 π‘Ÿ32 π‘Ÿ33 𝑑3

π‘₯

Intrinsic Matrix(Focal length, camera center)

Extrinsic Matrix(Rotation, Translation)

ImageCoordinates

Real WorldCoordinates

β‰ˆEPnP Solver

β€œSoft” Calibration - Prior Art

Chessboard Calibration

Vanishing Points

Geometric Landmarks

No Chessboard Patternsin Traffic Views

Assumption ofStraight Line Motion

Assumption ofLandmarks

AutoCalib Overview

AutoCalib𝑇

𝑅

Traffic Video Calibration Estimate

AutoCalib: no humans-in-the-loop, robust calibration

Video FramesVehicle

DetectionKeypoint

Extraction

Calibrations Set

Vehicle Geometric Dimensions

Calibration

Geometry based filters

Calibration Values

Cropped Image Vehicle Keypoints

π‘ŒπΆ

𝑋𝐢𝑍𝐢

𝑋𝐺

π‘ŒπΊ

𝑍𝐺

𝑅, π‘‡π‘ΉπŸ, π‘»πŸπ‘ΉπŸ, π‘»πŸ

:

AutoCalib - Pipeline

Vehicle Detection

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Vehicle Detection

β€’ Off-the-shelf DNNs (Fast-RCNN, YOLO) promise state of the art accuracyβ€’ Expensive, scene often empty

β€’ Background Subtraction is fastβ€’ Inaccurate

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Solution - Trigger the DNN with Background Subtraction

Key-point Extraction

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Key-point Selection

Desired Properties

1. Visually Distinct

β€’ Ease of detection

2. Non-planar

β€’ Robust Calibrations

vs

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Key-point Extraction

β€’ Statistical vision based techniques aren’t robust to lighting variations

β€’ DNNs require a lot of labelled dataβ€’ No datasets available

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Transfer learn a DNN on a smaller dataset

Transfer Learning - Primer

Convolution and Pooling Layers(Generic Features)

Fully Connected Layers(Car Model Classification)

Output:BMW 3 Series

Transfer Learning - Primer

Convolution and Pooling Layers(Generic Features)

Fully Connected Layers(now detecting key-points)

Output:Key-points (x,y)

Transfer Learning - Less Data, Faster Training

Key-point DNN Dataset

β€’ Manually labelled key-points on 486 car images

β€’ Image Augmentation

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Original Img Horz Mirror Horz Mirror Rotate

Horz Mirror Crop

Original Crop

Original Rotate

Total of 10,344 images post augmentation

Key-point DNN Trainingβ€’ GoogLeNet architecture trained on CUHK CompCars dataset (CVPR β€˜15)

for Car make/model classification

β€’ Replaced last two fully connected layers with keypoint regression outputs

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Key-point DNN Performance

~80% of Key-points < 10% error

Calibration Estimation

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

𝑦 =𝑓π‘₯ 0 𝑐π‘₯0 𝑓𝑦 𝑐𝑦0 0 1

π‘Ÿ11 π‘Ÿ12 π‘Ÿ13 𝑑1π‘Ÿ21 π‘Ÿ22 π‘Ÿ23 𝑑2π‘Ÿ31 π‘Ÿ32 π‘Ÿ33 𝑑3

π‘₯

Intrinsic Matrix(Focal length, camera center)

Extrinsic Matrix(Rotation, Translation)

ImageCoordinates

Real WorldCoordinates

Vehicle Identification at low resolution…

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

… is hard!(for both, humans and machines)

Can’t identify… so, approximate!

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

R1, T1

R3, T3

R2, T2

Calibrate

n Modelsn Calibrations

(Toyota Prius, Toyota Corolla, Honda Civic, Volkswagen Jetta, BMW 320i, Audi A4, etc.)

Calibrate with most popular cars

Errors in Calibration

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Key-point Prediction ErrorsModel Approximation Errors

Statistical filters to remove outliers and average

Key Insight 1

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Ground plane should be consistent across all Calibrations

The Orientation Filter

1. For calibration 𝑅𝑖 , 𝑇𝑖 , its Z-axis orientation Ԧ𝑧

is defined by vector π‘…βˆ—,3𝑖

2. Let Τ¦π‘§π‘Žπ‘£π‘” = π΄π‘£π‘’π‘Ÿπ‘Žπ‘”π‘’(π‘…βˆ—,3𝑖 )

3. Pick 𝑛% calibrations with the least deviation

between Ԧ𝑧 and Τ¦π‘§π‘Žπ‘£π‘”

𝑅1, 𝑇1

𝑅2, 𝑇2

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Key Insight 2

Distance to a fixed point must be consistent across Calibrations

𝑑

𝑝

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

The Displacement Filter

β€’ Focus region: Region where cars are detected

β€’ For each Calibration:

1. Point 𝑝𝑖 = projection of center of focus region on the ground plane

using (𝑅𝑖 , 𝑇𝑖)

2. 𝑑𝑖 = Distance of 𝑝𝑖 to camera

β€’ Pick middle 𝑛% and filter the rest

𝑑𝑖

𝑝𝑖

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Filtering Overview

Orientation Filter (75%)

Displacement Filter (50%)

Average Rotation Matrix

Orientation Filter (75%)

(π‘…π‘“π‘–π‘›π‘Žπ‘™ , π‘‡π‘“π‘–π‘›π‘Žπ‘™)

(𝑅1, 𝑇1) (𝑅2, 𝑇2) (𝑅3, 𝑇3)

… . .

Displacement Filter (Pick median)

(π‘…π‘Žπ‘£π‘”, 𝑇1) (π‘…π‘Žπ‘£π‘”, 𝑇

2) … . .

Video Frames Vehicle DetectionKeypoint

ExtractionCalibrations SetCalibration

Geometry based filters

Calibration Values

Implementation

Azure Service – 4 Tesla K80s, 224 GB RAM

< 12% error with ~8 minutes of video

Evaluation - Dataset

β€’ 350+ hours from 10 traffic cameras in

Seattle

β€’ Resolution - 640x360 to 1280x720

β€’ Ground truth distances and calibration

estimated using Google Earth

A

B D

EF G

Camera Image

A

B D

EF G

8m

8m

12m9m

Google Earth View

Evaluation

AutoCalib vs Manual Calibration

4.8 5.3 5.1 5.5

8.2

1.83.0

5.1

1.5

5.9

9.8

12.3

7.9

10.6 11.1

6.7

10.211.1

5.1 5.1

0

4

8

12

16

20

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

RM

S E

rror

(%)

Ground Distance Measurement, RMS Error (%)

Manual Calibration AutoCalib EstimateAutoCalib achieves <12% RMS error in measuring distances

AutoCalib vs Prior Art

9.812.3

7.910.6 11.1

6.7 10.2 11.1

5.15.1

16.8 14.920.3

28.8

15.8

5.4

23.019.4

14.7

56.8

0

10

20

30

40

50

60

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

RM

S E

rro

r (%

)

Ground Distance Measurement, RMS Error (%)

AutoCalib Calibration VP Approach [1]

[1] DubskΓ‘ et al., Fully automatic Roadside Camera Calibration for Traffic Surveillance. IEEE ITS 2015

AutoCalib outperforms prior state of the art approaches

Does more video data help?

AutoCalib converges with increasing vehicle detections

Application – Speed Measurement

AutoCalib Summary

β€’ Camera Calibration

β€’ Enables distance measurements

β€’ Highly manual today

β€’ AutoCalib

β€’ Scalable automatic calibration

β€’ Uses DNNs to analyze vehicle geometry

β€’ Experiments

β€’ < 12% error in measuring distances

β€’ Calibrates with few hundred detections

top related