deeptailsseminar#1 deeptails july9th,2020 · 2020. 7. 7. · deeptailsseminar#1 july9th,2020...
TRANSCRIPT
Deeptails Seminar #1
July 9th, 2020
Deeptails
How To Train Your Deep Multi-Object Tracker
Presenters: Yihong Xu1 and Xavier Alameda-Pineda1
Joint work with: Aljosa Osep2, Yutong Ban1,3, Radu Horaud1 and Laura Leal-Taixe2
1Inria, LJK, MIAI, Univ. Grenoble Alpes, France2Technical University of Munich, Germany3Distributed Robotics Lab, CSAIL, MIT, USA
Research Page | Download Code
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 1/26
Deeptails Seminars?
Rationale: deep learning requires engineering.
Aim: discuss best engineering practices togetherwith methodology.
Format: a not-so-young researcher (methodology)paired with a young researcher (deep details ordeeptails) – around 30 min.
The Devil is in the
A Series of Seminars on the Engineering behind Science
Deeptails
https://project.inria.fr/ml3ri/deeptails/
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 2/26
Motivation
DeepMulti-Object
Tracker
RGB InputImages
Ground-truthBounding boxes
HungarianAlgorithm
(Optimal Assignment)
Training
Evaluation
L2 loss
MOT Metrics
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 4/26
Motivation
DeepMulti-Object
Tracker
RGB InputImages
Ground-truthBounding boxes
HungarianAlgorithm
(Optimal Assignment)
Training
Evaluation
L2 loss
MOT Metrics
Deep HungarianNetwork
(Differentiable Assig.)
Training
EvaluationMOT Metrics
MOT loss
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 4/26
Methodology: Table of Contents
◮ Standard Practice: HA and MOT Metrics
◮ DHN: Deep Hungarian Net
◮ DeepMOT Loss: MOTA and MOTP Approximations
◮ Some Results
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 6/26
Standard practice
1. Compute the distance matrix Dt between predicted and ground-truth bboxes.
2. Apply the Hungarian algorithm to obtain the optimal assignment matrix At.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 7/26
Standard practice
1. Compute the distance matrix Dt between predicted and ground-truth bboxes.
2. Apply the Hungarian algorithm to obtain the optimal assignment matrix At.
At train Compute the L2 distance, and back-propagate this error.
At test Compute:
MOTA = 1−
∑t(FPt +FNt + IDSt)∑
t TPt
,
MOTP =
∑t
∑n,m dtnma
∗tnm
∑t |TPt |
,
MOTA is the classification accuracy.
MOTP is the estimated bbox precision.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 7/26
Standard practice
1. Compute the distance matrix Dt between predicted and ground-truth bboxes.
2. Apply the Hungarian algorithm to obtain the optimal assignment matrix At.
At train Compute the L2 distance, and back-propagate this error.
At test Compute:
MOTA = 1−
∑t(FPt +FNt + IDSt)∑
t TPt
,
MOTP =
∑t
∑n,m dtnma
∗tnm
∑t |TPt |
,
MOTA is the classification accuracy.
MOTP is the estimated bbox precision.
Two issues: neither the HA nor MOTA/P are differentiable procedures.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 7/26
DHN: approximating the HA
We propose the Deep Hungarian Net (DHN) to approximate the HA:
I DHN must accept inputs of different sizes.
I DHN must have a global receptive field.
I → DHN is designed combining flattening and Bi-RNN.
Distance Matrix(Track to Ground Truth)
M
N
Seq-to-seqBi-RNN
N
M2 × hidden units
Row-wiseflatten Reshape
M × N
...
M × N
2 × hidden units
Column-wiseflatten
First-stage hidden representation
N
M
Seq-to-seqBi-RNN
FC layers
Sigmoid
Soft Assignment Matrix
Reshape
N
M2 × hidden units
Second-stage hidden representation
Reshape
D
Ã
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 8/26
DeepMOT loss: approximating the MOTA and MOTP
δ δ δ0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
0.00.9 0.0
Column-wiseSoftmax
δ
δ
δ
Row-wiseSoftmax
1 0 0
0 1 0
Apply mask
0.0
0.8
0.1
0.00.0
0.00.0
0.20.0 0.0
0.0
DeepHungarian
100
Xt1
Xt2
Xt3
0.5 0.3 0.2
0.7 0.1 0.6
∞
D
0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
0.0 0.0 1.0
0.10.0 0.9 0.0
0.1 0.1 0.0
0.0 0.0 1.0
0.0 1.0 0.0
0.1 0.0 0.0
0.0 0.0 1.0
0.0 1.0 0.0
0.1 0.0 0.0
0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
1
00
0
0 00
0
1
Create maskwi th TP
*element-wise multiplication
yt2yt1 yt3
a)
b)
c)
∞ ∞
Ã
Xt1
Xt2
Xt3
yt2yt1 yt3
Xt1
Xt2
Xt3
Xt1
Xt2
Xt3
yt2yt1 yt3
yt2yt1 yt3
Xt1
Xt2
Xt3
Xt1
Xt2
Xt3
yt2yt1 yt3
yt2yt1 yt3
Xt1
Xt2
Xt3
yt2yt1 yt3
Xt-1,1
Xt-1,2
Xt-1,3
yt-1,1 yt-1,2 yt-1,3
Xt1
Xt2
Xt3
yt2yt1 yt3
dMOTP = || Bᵀᴾ ||0
Σ
FP = Σ
FN = Σ
IDS = Σ
Cᶜ
Cʳ
|| Bᵀᴾ ||0 = Σ
Net
D and A have estimated objects as rowsand ground-truth as columns.
To compute the FP: complete with aconstant column + row-wise softmax.
Analogous for FN.
IDSwitches are computed masking FNmatrix assignments at previous frame.
We have MOTA. To obtain MOTP, wemask the distance matrix D.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 9/26
DeepMOT loss: approximating the MOTA and MOTP
δ δ δ0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
0.00.9 0.0
Column-wiseSoftmax
δ
δ
δ
Row-wiseSoftmax
1 0 0
0 1 0
Apply mask
0.0
0.8
0.1
0.00.0
0.00.0
0.20.0 0.0
0.0
100
Xt1
Xt2
Xt3
0.5 0.3 0.2
0.7 0.1 0.6
∞
D
0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
0.0 0.0 1.0
0.10.0 0.9 0.0
0.1 0.1 0.0
0.0 0.0 1.0
0.0 1.0 0.0
0.1 0.0 0.0
0.0 0.0 1.0
0.0 1.0 0.0
0.1 0.0 0.0
0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
1
00
0
0 00
0
1
Create maskwi th TP
*element-wise multiplication
yt2yt1 yt3
a)
b)
c)
∞ ∞
Ã
Xt1
Xt2
Xt3
yt2yt1 yt3
Xt1
Xt2
Xt3
Xt1
Xt2
Xt3
yt2yt1 yt3
yt2yt1 yt3
Xt1
Xt2
Xt3
Xt1
Xt2
Xt3
yt2yt1 yt3
yt2yt1 yt3
Xt1
Xt2
Xt3
yt2yt1 yt3
Xt-1,1
Xt-1,2
Xt-1,3
yt-1,1 yt-1,2 yt-1,3
Xt1
Xt2
Xt3
yt2yt1 yt3
dMOTP = || Bᵀᴾ ||0
Σ
FP = Σ
FN = Σ
IDS = Σ
Cᶜ
Cʳ
|| Bᵀᴾ ||0 = Σ
DeepHungarian
Net
D and A have estimated objects as rowsand ground-truth as columns.
To compute the FP: complete with aconstant column + row-wise softmax.
Analogous for FN.
IDSwitches are computed masking FNmatrix assignments at previous frame.
We have MOTA. To obtain MOTP, wemask the distance matrix D.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 9/26
DeepMOT loss: approximating the MOTA and MOTP
δ δ δ0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
0.00.9 0.0
Column-wiseSoftmax
δ
δ
δ
Row-wiseSoftmax
1 0 0
0 1 0
Apply mask
0.0
0.8
0.1
0.00.0
0.00.0
0.20.0 0.0
0.0
100
Xt1
Xt2
Xt3
0.5 0.3 0.2
0.7 0.1 0.6
∞
D
0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
0.0 0.0 1.0
0.10.0 0.9 0.0
0.1 0.1 0.0
0.0 0.0 1.0
0.0 1.0 0.0
0.1 0.0 0.0
0.0 0.0 1.0
0.0 1.0 0.0
0.1 0.0 0.0
0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
1
00
0
0 00
0
1
Create maskwi th TP
*element-wise multiplication
yt2yt1 yt3
a)
b)
c)
∞ ∞
Ã
Xt1
Xt2
Xt3
yt2yt1 yt3
Xt1
Xt2
Xt3
Xt1
Xt2
Xt3
yt2yt1 yt3
yt2yt1 yt3
Xt1
Xt2
Xt3
Xt1
Xt2
Xt3
yt2yt1 yt3
yt2yt1 yt3
Xt1
Xt2
Xt3
yt2yt1 yt3
Xt-1,1
Xt-1,2
Xt-1,3
yt-1,1 yt-1,2 yt-1,3
Xt1
Xt2
Xt3
yt2yt1 yt3
dMOTP = || Bᵀᴾ ||0
Σ
FP = Σ
FN = Σ
IDS = Σ
Cᶜ
Cʳ
|| Bᵀᴾ ||0 = Σ
DeepHungarian
Net
D and A have estimated objects as rowsand ground-truth as columns.
To compute the FP: complete with aconstant column + row-wise softmax.
Analogous for FN.
IDSwitches are computed masking FNmatrix assignments at previous frame.
We have MOTA. To obtain MOTP, wemask the distance matrix D.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 9/26
DeepMOT loss: approximating the MOTA and MOTP
δ δ δ0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
0.00.9 0.0
Column-wiseSoftmax
δ
δ
δ
Row-wiseSoftmax
1 0 0
0 1 0
Apply mask
0.0
0.8
0.1
0.00.0
0.00.0
0.20.0 0.0
0.0
Deep Hungarian
Net
100
Xt1
Xt2
Xt3
0.5 0.3 0.2
0.7 0.1 0.6
∞
D
0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
0.0 0.0 1.0
0.10.0 0.9 0.0
0.1 0.1 0.0
0.0 0.0 1.0
0.0 1.0 0.0
0.1 0.0 0.0
0.0 0.0 1.0
0.0 1.0 0.0
0.1 0.0 0.0
0.1 0.1 0.9
0.2 0.8 0.2
0.3 0.3 0.2
1
00
0
0 00
0
1
Create mask with TP
*element-wise multiplication
yt2yt1 yt3
a)
b)
c)
∞ ∞
Ã
Xt1
Xt2
Xt3
yt2yt1 yt3
Xt1
Xt2
Xt3
Xt1
Xt2
Xt3
yt2yt1 yt3
yt2yt1 yt3
Xt1
Xt2
Xt3
Xt1
Xt2
Xt3
yt2yt1 yt3
yt2yt1 yt3
Xt1
Xt2
Xt3
yt2yt1 yt3
Xt-1,1
Xt-1,2
Xt-1,3
yt-1,1 yt-1,2 yt-1,3
Xt1
Xt2
Xt3
yt2yt1 yt3
dMOTP = || Bᵀᴾ ||0
Σ
FP = Σ
FN = Σ
IDS = Σ
Cᶜ
Cʳ
|| Bᵀᴾ ||0 = Σ
D and A have estimated objects as rowsand ground-truth as columns.
To compute the FP: complete with aconstant column + row-wise softmax.
Analogous for FN.
IDSwitches are computed masking FNmatrix assignments at previous frame.
We have MOTA. To obtain MOTP, wemask the distance matrix D.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 9/26
Using the approximation
DeepMulti-Object
Tracker
RGB InputImages
Ground-truthBounding boxes
Deep HungarianNetwork
(Differentiable Assig.)
Training
EvaluationMOT Metrics
MOT loss
Thanks to DHN and (Deep)MOT loss, we optimise a proxy of the evaluation metrics.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 10/26
Tracking Results VisualizationOriginal
t t + 1 t + 2 t + 3Ours
t t + 1 t + 2 t + 3IDS
Original Ours Original Ours
FN FP
Table: Original v.s.Ours: on IDS (top), FN (bottom-left) and FP (bottom-right).
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 11/26
Quantitative comparison
Method MOTA ↑ MOTP ↑ IDF1 ↑ MT ↑ ML ↓ FP ↓ FN ↓ IDS ↓
MOT17
DeepMOT-Tracktor 53.7 77.2 53.8 19.4 36.6 11731 247447 1947Tracktor 53.5 78.0 52.3 19.5 36.6 12201 248047 2072
eHAF 51.8 77.0 54.7 23.4 37.9 33212 236772 1834FWT 51.3 77.0 47.6 21.4 35.2 24101 247921 2648jCC 51.2 75.9 54.5 20.9 37.0 25937 247822 1802MOTDT17 50.9 76.6 52.7 17.5 35.7 24069 250768 2474MHT DAM 50.7 77.5 47.2 20.8 36.9 22875 252889 2314
MOT16
DeepMOT-Tracktor 54.8 77.5 53.4 19.1 37.0 2955 78765 645Tracktor 54.4 78.2 52.5 19.0 36.9 3280 79149 682
HCC 49.3 79.0 50.7 17.8 39.9 5333 86795 391LMP 48.8 79.0 51.3 18.2 40.1 6654 86245 481GCRA 48.2 77.5 48.6 12.9 41.1 5104 88586 821FWT 47.8 75.5 44.3 19.1 38.2 8886 85487 852MOTDT 47.6 74.8 50.9 15.2 38.3 9253 85431 792
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 12/26
Deeptails: Table of Contents
◮ Distance Matrix Calculation
◮ Deep Hungarian Network (DHN)◮ Data Augmentation◮ Training Strategy◮ Ablation Study
◮ DeepMOT Training◮ Soft Discretization◮ Data Augmentation◮ Training Strategy◮ Ablation Study
DeepMOT
Deep Hungarian Net DeepMOT LossDeep Multi-Object Tracker
RGB Images
Bounding Boxes gradients
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 14/26
Distance Matrix Calculation 1/2
2
111IoU( , )=0
IoU( , )=02
DIoU = 1− IoU = 1 if no overlap.→ zero gradient.⇒ We use D = (DL2 + DIoU)/2.
DL2 = 1− exp(−5Dnorm)),
Dnorm = [(x − x)2 + (y − y)2]/d2
d2 = h2 + w2, h, w height/width(x , y) and (x , y) is the center ofthe predicted/ground-truth bbox.
0 100 200 300 400 500 600
0.0
0.2
0.4
0.6
0.8
1.0
distance
1-IoU
D_norm
D_l2
0.5*(D_l2+1-IoU)
I-IoU 0.5*(I-IoU+DL2)
DL2
Dnorm
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 15/26
Distance Matrix Calculation 2/2
ROI pooling
L2 Normalize
Concat1×1
conv
conv1 conv2 conv3 conv4 conv5
RPN
fc fc
fc
fc
fc fc
fc
fc
softmax
bbox
fc bboxreid branch
1×1
fcRGB
Images
Bounding
Box proposals
source: image modified from Hoang Ngan Le, T., et al. 2016.
Appearance vectors (F ):ROI pooling + reid-branch.
Cosine distance:
Dcos = 0.5 ∗ (1−Fgt · Fpred
‖Fgt‖ ‖Fpred‖)
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 16/26
DHN-Data Augmentation
Xt1
Xt2
Xt3
0.5 0.3 0.20.7 0.1 0.6
D
yt2yt1 yt3
0.3 0.7 0.4Threshold = 0.65
Xt1
Xt2
Xt3
0.5 0.3 0.2 D1
yt2yt1 yt3
0.3 ∞ 0.4∞ 0.1 0.6
Th
resh
old
= 0
.55
Xt1
Xt2
Xt3
0.5 0.3 0.2D2
yt2yt1 yt3
0.4∞ 0.1
∞∞
0.3
Xt1
Xt2
Xt3
D3
yt2yt1 yt3
0.4∞ 0.1
∞∞
0.3
Thre
shold
= 0
.42
∞ 0.3 0.2
Xt1
Xt2
Xt3
0.5 0.3 0.20.7 0.1 0.6
D
yt2yt1 yt3
0.3 0.7 0.4Random Row
Permutation
X
X
X
t1
t2
t3
D1
yt2yt1 yt3
0.7 0.1 0.60.40.70.3
0.5 0.3 0.2
Ran
dom
Colu
mn
Perm
utatio
n
X
X
X
t1
t2
t3
D2
yt2yt1 yt3
0.60.4
0.70.3
0.50.10.7
0.3 0.2
◮ We randomly threshold DHN input distance Matrix D with three different thresholds,constructing a dataset 114,483 training and 17,880 testing instances.
◮ During DHN training, with a probability of 0.5 (uniform distribution), we randomly permute therows/columns of the distance matrix and its corresponding target assignment matrix.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 17/26
DHN-Training Strategy
◮ We train DHN as a 2D classification task.
◮ RMSprop optimizer is used with a learning rate of 0.0003, gradually decreasing by5% every 20,000 iterations for 20 epochs (6 hours on a Titan XP GPU).
◮ For unbalanced labels (too many zeros in the target matrices), we weightzero-class by w0 = n1/(n0 + n1) and one-class by w1 = 1− w0. Here n0 is thenumber of zeros and n1 the number of ones in target matrix.
◮ The loss function is Focal loss with a modulating factor of γ = 2.
◮ Once trained, DHN weights are fixed during the DeepMOT training.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 18/26
DHN-Ablation Study 1/2
We ablate the architectures ofDHN and compare theirperformance using the metrics:
- MA (Missing Assignment),- SA (Several Assignment),- WA (Weighted Accuracy).
Deep Hungarian
Net
Xt1
Xt2
Xt3
0.5 0.3 0.2
0.7 0.1 0.6
∞
D
yt2yt1 yt3
∞ ∞
Ã
Xt1
Xt2
Xt3
yt2yt1
0.3 0.3 0.2
0.20.80.2
0.1 0.1 Row-wise
Maximum
th=0.5
yt3
0.9Colum
n-wise
Maxim
um
th=0.5
1
00
0
0 00
0
1Xt1
Xt2
Xt3
yt2yt1 yt3
Ac¯
1
00
0
0 00
0
1Xt1
Xt2
Xt3
yt2yt1 yt3
Ar¯
Hard-assigned Predictions
Soft Assignment MatrixDistance Matrix
Ã
Xt1
Xt2
Xt3
yt2yt1 yt3
0.7 0.3 0.2
0.80.20.9
0.1 0.1 0.1
Soft Assignment Matrix
1
00
0
0 00
0
1Xt1
Xt2
Xt3
yt2yt1 yt3
A*
Ground-truth Predictions
Row
-wise
Maxim
um
th=0.5
Column-w
ise
Maxim
um
th=0.5
Xt1
Xt2
Xt3
yt2yt1 yt3
Ac¯
Xt1
Xt2
Xt3
yt2yt1 yt3
Ar¯
Hard-assigned Predictions
0
000
1
00
0
1
0
000
1
00
1
0
#
# #
#
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 19/26
DHN-Ablation Study 2/2Distance Matrix
(Track to Ground Truth)
M
N
Seq-to-seqBi-RNN
N
M2 × hidden units
Row-wiseflatten Reshape
M × N
...
M × N
2 × hidden units
Column-wiseflatten
First-stage hidden representation
N
M
Seq-to-seqBi-RNN
FC layers
Sigmoid
Soft Assignment Matrix
Reshape
N
M2 × hidden units
Second-stage hidden representation
Reshape
D
Ã
Proposed Sequential (seq) Bi-RNN DHN
Distance Matrix(Track to Ground Truth)
M
N
N
M
Soft Assignment Matrix
Seq-to-seqBi-RNN
Seq-to-seqBi-RNN
...
M × N
...
N × M
Row-wiseflatten
Colum-wiseflatten
FC layers
Sigmoid
Reshape ÃD
Parallel (paral) Bi-RNN DHN
Distance Matrix(Track to Ground Truth)
M
N
Row-wiseflatten
...
M × N
Conv1D(1,24,15)
M × N
24
1/2×M × N
48
Conv1D(24,48,15)
481/4×M
× N
Pooling 1/21/2×M × N
48
Upsampling × 2
1/2×M × N48
Conv1D(96,48,5)
Concatenate
M × N
48
Upsampling × 2
M × N
24 Conv1D(72,24,5)
Concatenate
M × N
25
Concatenate
N
M
Soft Assignment Matrix
Conv1D(25,1,1)
...
M × NSigmoid
Reshape
Conv1D(48, 48,15)
D Ã
1D-convolutional (1d conv) DHN
Discretization Network WA % (↑) MA% (↓) SA% (↓)
Row-wisemaximum
seq gru (proposed) 92.71 13.17 9.70seq lstm 91.64 14.55 10.37paral gru 86.84 23.50 17.15paral lstm 71.58 42.48 22.621d conv 83.12 32.73 5.73
Discretization Network WA % (↑) MA% (↓) SA% (↓)
Column-wisemaximum
seq gru (proposed) 92.36 12.21 3.69seq lstm 91.93 13.15 4.71paral gru 87.24 20.56 16.67paral lstm 72.58 39.55 23.161d conv 82.74 32.94 1.11
Ablation Study of DHN architectures on 252,355 matrices collected during the DeepMOT training process.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 20/26
DeepMOT Training-Soft Discretization
A soft discretization process replacesargmax operation is performed byappending a threshold column(row) withvalue δ = 0.5 and a row(column)-wisesoftMax.
i=1 i=2 i=30.0
0.2
0.4
0.6
0.8
1.0
Original Vector
ai
Original Vector
i=1 i=2 i=3 i=1 i=2 i=3
Soft
Max(a
i)
SoftMax T=1
Soft
Max(a
i)
SoftMax T=0.01
i=1 i=2 i=3
0.50 0.56 0.60
0.31 0.34 0.35
0.020.00
0.98
softMax(ai ) =exp(ai/T )
∑j exp(aj/T )
(1)
A lower temperature T = 0.01 is used inthe softMax activation function.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 21/26
DeepMOT Training-Data Augmentation
Random Shift
Ran
dom
Scale
>1
Random
Sca
le <
1
During the track initialization, to mimicnoisy detections in the real world, noise isadded to ground-truth (GT) boundingboxes during DeepMOT training.
◮ with a probability of 0.4 (uniformdistribution), a GT box is randomlyshifted by k * box height or k * boxwidth, k ∈ [0.025, 0.05),
◮ with a probability of 0.4(uniformdistribution), a GT box is randomlyscaled by λ * box height or λ * boxwidth, λ ∈ [0.8, 1.5).
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 22/26
DeepMOT Training-Training Strategy
◮ We initialise tracks with noisy GT boxes at t = 0 and keep regressing next-framebounding boxes. The tracking lasts for 10 frames.
◮ We calculate deepMOT loss and update weights of the deep multi-object trackerwith gradients from deepMOT loss at each time step .
◮ We use Adam optimizer with a learning rate of 0.0001.
◮ We train the baseline SOTs for 15 epochs (72h), and we train Tracktor(regression head and ReID head) for 18 epochs (12h) on a Titan XP GPU.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 23/26
DeepMOT Training-Ablation StudyTraining loss MOTA ↑ MOTP ↑ IDF1 ↑ MT ↑ ML ↓ FP ↓ FN ↓ IDS ↓
Vanilla 60.20 89.50 71.15 35.13 27.80 276 31827 152Smooth L1 60.38 91.81 71.27 34.99 27.25 294 31649 164
dMOTP 60.51 91.74 71.75 35.41 26.83 291 31574 142dMOTA 60.52 88.31 71.92 35.41 27.39 254 31597 142
dMOTA+dMOTP- ˜IDS 60.61 92.03 72.10 35.41 27.25 222 31579 124dMOTA+dMOTP 60.66 91.82 72.32 35.41 27.25 218 31545 118
Table: The effect of different components of the DeepMOT Loss.
Training MOTA ↑ MOTP ↑ IDF1 ↑ MT ↑ ML ↓ FP ↓ FN ↓ IDS ↓
GOTURN Pre-trained 45.99 85.87 49.83 22.27 36.51 2927 39271 1577
Smooth L1 52.28 90.56 63.53 29.46 34.58 2026 36180 472DeepMOT 54.09 90.95 66.09 28.63 35.13 927 36019 261
SiamRPN Pre-trained 55.35 87.15 66.95 33.61 31.81 1907 33925 356
Smooth L1 56.51 90.88 68.38 33.75 32.64 925 34151 167DeepMOT 57.16 89.32 69.49 33.47 32.78 889 33667 161
Tracktor Vanilla 60.20 89.50 71.15 35.13 27.80 276 31827 152
Smooth L1 60.38 91.81 71.27 34.99 27.25 294 31649 164DeepMOT 60.66 91.82 72.32 35.41 27.25 218 31545 118
Table: DeepMOTv.s. Smooth L1 on the validation set.
◮ We demonstrate the merit of different components of the DeepMOT Loss.◮ We obtain better performance when using our DeepMOT compared to Smooth L1 based loss =>
explicitly establishing matching is important!◮ DeepMOT can jointly train bounding box regressor and an internal re-identification module!
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 24/26
Conclusion
(i) We propose a novel framework to train deep multi-object trackers.◮ DHN as a differentiable alternative to HA.◮ DeepMOT loss as a proxy to MOT metrics [2].
(ii) Detailed description of the design of DHN and deepMOT loss.
(iii) Provided the training and data augmentation strategies for the DHN.
(iv) Once DHN is leraned, we describe the training of a deep multi-object trackerusing the DHN and DeepMOT loss.
(v) This allows to back-propagate “through the assignment problem” !!!
(vi) Use the framework to train Tracktor [1] and establish a new state-of-the-art onMOT Challenge [4, 3].
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 25/26
References
Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixe.
Tracking without bells and whistles.ICCV, 2019.
Keni Bernardin and Rainer Stiefelhagen.
Evaluating multiple object tracking performance: The clear mot metrics.JIVP, 2008:1:1–1:10, 2008.
Laura Leal-Taixe, Anton Milan, Ian Reid, Stefan Roth, and Konrad Schindler.
MOTChallenge 2015: Towards a benchmark for multi-target tracking.arXiv preprint arXiv:1504.01942, 2015.
Anton Milan, Laura. Leal-Taixe, Ian Reid, Stefan Roth, and Konrad Schindler.
MOT16: A benchmark for multi-object tracking.arXiv preprint arXiv:1603.00831, 2016.
Deeptails
http://project.inria.fr/ml3ri/deeptails – [email protected] 26/26