unsupervised learning of object landmarks tomas jakab*1 ...vgg/...landmarks/poster.pdf · facial...

1
http://www.robots.ox.ac.uk/~vgg/research/ unsupervised_landmarks/ Unsupervised Learning of Object Landmarks through Conditional Image Generation 1. OVERVIEW “Unsupervised discovery of semantically stable landmarks for visual objects” CONTRIBUTIONS § Object landmark discovery without manual annotations. Outperform state-of-the-art facial landmark detection methods using a simple method. § Learn from synthetically warped images / videos directly. Applicable to a variety of datasets without modification ⎯⎯ faces, humans, 3D objects, digits. § Method factorizes object appearance and geometry transfer style / pose. 3. RESULTS 4. DISENTANGLING STYLE & GEOMETRY 2. METHOD DISTILLING GEOMETRY “SUBTRACTpairs of images which share appearance, but differ in object pose / geometry. Videos Frames from a video of an object. Synthetically Warped Images Thin-plate spline warped versions of a single image. training input / output source target reconstruction landmarks HUMAN F ACES HUMAN POSE 3D OBJECTS (content loss) unsupervised landmarks N = 10 regressed landmarks linear regression AFLW Dataset (train: synthetic warps) VOXCELEB Dataset (train: video frames) unsupervised landmarks N = 20 Financial support was provided by the UK EPSRC CDT in Autonomous Intelligent Machines and Systems Grant EP/L015987/2, EPSRC Programme Grant Seebibyte EP/M013774/1, ERC 677195- IDIU, and the Clarendon Fund scholarship A I M S Autonomous Intelligent Machines & Systems supervised methods IoD normalised %-MSE 0 1 2 3 4 5 6 7 8 9 TCDCN, Zhang [2016] MTCNN, Zhang [2013] Zhang [2018] (w/o equiv.) Thewlis [2017] Thewlis [2017] frames Shu [2018] Wiles [2018] Zhang [2018] (w/ equiv.) Ours (30 kpts) Ours (50 kpts) unsupervised methods MAFL facial landmark detection 0 2 4 6 8 10 12 14 16 18 1 5 10 100 500 1000 5000 19000 n supervised examples Thewlis [2017] Ours sample efficiency for supervised regression 0 5 10 15 20 25 30 35 d = 60 d = 20 d = 10 Ours (K = 30) replace keypoint bottleneck with FC-layer Tomas Jakab* 1 Ankush Gupta* 1 Hakan Bilen 2 Andrea Vedaldi 1,3 *equal contribution (2) University of Edinburgh (1) Visual Geometry Group (VGG) University of Oxford (3) Facebook AI Research London BBCPose Dataset unsupervised landmarks regressed landmarks Human3.6M Dataset unsupervised landmarks SmallNORB Dataset azimuth elevation lighting shape / instance different style “source” geometry “target” output style geometry reconstruction

Upload: others

Post on 19-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unsupervised Learning of Object Landmarks Tomas Jakab*1 ...vgg/...landmarks/poster.pdf · facial landmark detection 0 2 4 6 8 10 12 14 16 18 1 5 10 100 500 1000 5000 19000 n supervised

http://www.robots.ox.ac.uk/~vgg/research/

unsupervised_landmarks/

Unsupervised Learning of Object Landmarks

through Conditional Image Generation

1. OVERVIEW

“Unsupervised discovery of semantically stable landmarks for visual objects”

CONTRIBUTIONS

§ Object landmark discovery without manual annotations.Outperform state-of-the-art facial landmark detection methods using a simple method.

§ Learn from synthetically warped images / videos directly.

Applicable to a variety of datasets without modification ⎯⎯ faces, humans, 3D objects, digits.

§ Method factorizes object appearance and geometry transfer style / pose.

3. RESULTS 4. DISENTANGLING STYLE & GEOMETRY

2. METHOD

DISTILLING GEOMETRY

“SUBTRACT” pairs of images which share appearance, but differ in object pose / geometry.

Videos Frames from a video of an object.

Synthetically Warped Images

Thin-plate spline warped versions of a single image.

training input / output

so

urc

eta

rge

tre

co

nstr

uc

tio

nla

nd

ma

rks

HUMAN FACES HUMAN POSE 3D OBJECTS

(content loss)

unsupervised

landmarks

N = 10

regressed

landmarks linear

regression

AFLW Dataset (train: synthetic warps)

VOXCELEB Dataset (train: video frames)

unsupervised

landmarks

N = 20

Financial support was provided by the UK EPSRC CDT in Autonomous Intelligent Machines

and Systems Grant EP/L015987/2, EPSRC Programme Grant Seebibyte EP/M013774/1,

ERC 677195- IDIU, and the Clarendon Fund scholarshipA I M S

Autonomous Intelligent

Machines & Systems

supervised

methods

IoD

no

rma

lise

d%

-MS

E

0

1

2

3

4

5

6

7

8

9

TCDCN,

Zhang

[2016]

MTCNN,

Zhang

[2013]

Zhang

[2018]

(w/o

equiv.)

Thewlis

[2017]

Thewlis

[2017]

frames

Shu

[2018]

Wiles

[2018]

Zhang

[2018]

(w/

equiv.)

Ours (30

kpts)

Ours (50

kpts)

unsupervised

methods

MAFLfacial landmark

detection

0

2

4

6

8

10

12

14

16

18

1 5 10 100 500 1000 5000 19000

n supervised examples

Thewlis [2017] Ours

sample efficiency for

supervised regression

0

5

10

15

20

25

30

35

d = 6

0

d = 2

0

d = 1

0

Ours

(K =

30)

replace keypoint

bottleneck with

FC-layer

Tomas Jakab*1 Ankush Gupta*1 Hakan Bilen2 Andrea Vedaldi1,3*equal contribution

(2) University of Edinburgh(1) Visual Geometry Group (VGG)

University of Oxford

(3) Facebook AI Research

London

BBCPose Dataset

unsupervised

landmarks

regressed

landmarks

Human3.6M Dataset

unsupervised

landmarks

SmallNORB Dataset

azimuth

ele

va

tio

n

lighting

sh

ap

e /

in

sta

nc

e

different

style “source”

geometry

“target”

output

sty

leg

eo

me

try

rec

on

str

uc

tio

n