unsupervised learning of object landmarks tomas jakab*1 ...vgg/...landmarks/poster.pdf · facial...
TRANSCRIPT
http://www.robots.ox.ac.uk/~vgg/research/
unsupervised_landmarks/
Unsupervised Learning of Object Landmarks
through Conditional Image Generation
1. OVERVIEW
“Unsupervised discovery of semantically stable landmarks for visual objects”
CONTRIBUTIONS
§ Object landmark discovery without manual annotations.Outperform state-of-the-art facial landmark detection methods using a simple method.
§ Learn from synthetically warped images / videos directly.
Applicable to a variety of datasets without modification ⎯⎯ faces, humans, 3D objects, digits.
§ Method factorizes object appearance and geometry transfer style / pose.
3. RESULTS 4. DISENTANGLING STYLE & GEOMETRY
2. METHOD
DISTILLING GEOMETRY
“SUBTRACT” pairs of images which share appearance, but differ in object pose / geometry.
Videos Frames from a video of an object.
Synthetically Warped Images
Thin-plate spline warped versions of a single image.
training input / output
so
urc
eta
rge
tre
co
nstr
uc
tio
nla
nd
ma
rks
HUMAN FACES HUMAN POSE 3D OBJECTS
(content loss)
unsupervised
landmarks
N = 10
regressed
landmarks linear
regression
AFLW Dataset (train: synthetic warps)
VOXCELEB Dataset (train: video frames)
unsupervised
landmarks
N = 20
Financial support was provided by the UK EPSRC CDT in Autonomous Intelligent Machines
and Systems Grant EP/L015987/2, EPSRC Programme Grant Seebibyte EP/M013774/1,
ERC 677195- IDIU, and the Clarendon Fund scholarshipA I M S
Autonomous Intelligent
Machines & Systems
supervised
methods
IoD
no
rma
lise
d%
-MS
E
0
1
2
3
4
5
6
7
8
9
TCDCN,
Zhang
[2016]
MTCNN,
Zhang
[2013]
Zhang
[2018]
(w/o
equiv.)
Thewlis
[2017]
Thewlis
[2017]
frames
Shu
[2018]
Wiles
[2018]
Zhang
[2018]
(w/
equiv.)
Ours (30
kpts)
Ours (50
kpts)
unsupervised
methods
MAFLfacial landmark
detection
0
2
4
6
8
10
12
14
16
18
1 5 10 100 500 1000 5000 19000
n supervised examples
Thewlis [2017] Ours
sample efficiency for
supervised regression
0
5
10
15
20
25
30
35
d = 6
0
d = 2
0
d = 1
0
Ours
(K =
30)
replace keypoint
bottleneck with
FC-layer
Tomas Jakab*1 Ankush Gupta*1 Hakan Bilen2 Andrea Vedaldi1,3*equal contribution
(2) University of Edinburgh(1) Visual Geometry Group (VGG)
University of Oxford
(3) Facebook AI Research
London
BBCPose Dataset
unsupervised
landmarks
regressed
landmarks
Human3.6M Dataset
unsupervised
landmarks
SmallNORB Dataset
azimuth
ele
va
tio
n
lighting
sh
ap
e /
in
sta
nc
e
different
style “source”
geometry
“target”
output
sty
leg
eo
me
try
rec
on
str
uc
tio
n