Transcript
Page 1: Look-ahead before you leap: End-to-end active recognition by …vision.cs.utexas.edu/projects/lookahead_active/eccv16... · 2016. 10. 17. · 2011, Aloimonos 2011, Borotschnig 2011,

Active recognition setting

Difficulty: unconstrained visual input

Opportunity:

• Not restricted to a single snapshot.

• Strategically acquiring new views.

passive 1-view setting active setting

mug?bowl?pan?

mug

pan

mug?bowl?pan?

mug

Perception Perception

Action selection

Evidence fusion

Perception Action selection Evidence fusion

JOIN

T TR

AIN

ING

Look-ahead

FORECASTING THE EFFECTS OF ACTIONS

Our idea: Multi-task joint training of components for active recognition + auxiliary internally supervised “look-ahead” task.

Components of the active recognition pipeline

Our idea

[Wilkes 1992, Dickinson 1997, Borotschnig 1998, Schiele 1998, Denzler 2002, Soatto 2009, Ramanathan2011, Aloimonos 2011, Borotschnig 2011, Wu 2015, Malmir 2015, Johns 2016, …]

Prior art: independent, often heuristic components

Closely intertwined perception, action and fusion components

High-level active RNN system architecture

SENSOR

AGGREGATOR

ACTOR

CLASSIFIER

LOOK-AHEAD

SENSOR

AGGREGATOR

ACTOR

CLASSIFIER

LOOK-AHEAD

Time 𝑡 Time 𝑡+1

SUN 360 panoramas

GERMS toy manipulation

[Xia

o 2

01

2]

[Mal

mir

20

15

]

ModelNet-10 3-D models [Wu

20

15

]

SCENE CATEGORIES

HAND-HELDOBJECTS

3-D CADMODELS

Towards real-world active recognition

Complex real-world categories + easily benchmarkable setups.

Component module architectures

SENSOR

AGGREGATOR

ACTOR

CLASSIFIER

LOOK-AHEAD

“ El

ev: 30°”

pose image SEN

SOR

+ DELAY 1 step

History AG

GR

EGA

TOR

Look-ahead ℓ2 error

Time 𝑡 − 1 Time 𝑡

action aggregate feature

LOO

K-A

HEA

D sample action

AC

TOR

1

2

3

4

5

12

3

4

predicted class likelihoods

CLA

SSIF

IER5

End-to-end joint training with gradient descent + REINFORCE

35404550556065

1 2 3 4 5

Acc

ura

cy

#views

SUN 360

Passive neural netTransinformation [Schiele98]SeqDP [Denzler03]Transinformation+SeqDPOurs

75

80

85

90

95

1 3 6

#views

ModelNet-10

Passive neural net

ShapeNets [Wu15]

Pairwise [Johns 16]

Ours

25

30

35

40

45

50

1 2 3

#views

GERMS

Passive neural netTransinformation [Schiele98]SeqDP[Denzler03]Transinformation+SeqDPOurs

Our method strongly outperforms representative traditional active recognition approaches on all tasks.

Selected view sequence examples

Top guess Top guess Top guess

GER

MS

Toy

inst

ance

s

SUN

36

0Sc

ene

cat

ego

ries

Quantitative evaluation results

Ablation studies

40

45

50

55

60

65

1 2 3 4 5

#views

SUN 360

40

42

44

46

48

1 2 3

#views

GERMS

Random views(avg)

Random views(recurrent)

Training all 3 components jointly is most critical to performance.

Look-ahead before you leap: End-to-end active recognition by forecasting the effect of motion

Dinesh Jayaraman and Kristen GraumanUT Austin

Top Related