look-ahead before you leap: end-to-end active recognition by...
TRANSCRIPT
Active recognition setting
Difficulty: unconstrained visual input
Opportunity:
• Not restricted to a single snapshot.
• Strategically acquiring new views.
passive 1-view setting active setting
mug?bowl?pan?
mug
pan
mug?bowl?pan?
mug
Perception Perception
Action selection
Evidence fusion
Perception Action selection Evidence fusion
JOIN
T TR
AIN
ING
Look-ahead
FORECASTING THE EFFECTS OF ACTIONS
Our idea: Multi-task joint training of components for active recognition + auxiliary internally supervised “look-ahead” task.
Components of the active recognition pipeline
Our idea
[Wilkes 1992, Dickinson 1997, Borotschnig 1998, Schiele 1998, Denzler 2002, Soatto 2009, Ramanathan2011, Aloimonos 2011, Borotschnig 2011, Wu 2015, Malmir 2015, Johns 2016, …]
Prior art: independent, often heuristic components
Closely intertwined perception, action and fusion components
High-level active RNN system architecture
SENSOR
AGGREGATOR
ACTOR
CLASSIFIER
LOOK-AHEAD
SENSOR
AGGREGATOR
ACTOR
CLASSIFIER
LOOK-AHEAD
Time 𝑡 Time 𝑡+1
SUN 360 panoramas
GERMS toy manipulation
[Xia
o 2
01
2]
[Mal
mir
20
15
]
ModelNet-10 3-D models [Wu
20
15
]
SCENE CATEGORIES
HAND-HELDOBJECTS
3-D CADMODELS
Towards real-world active recognition
Complex real-world categories + easily benchmarkable setups.
Component module architectures
SENSOR
AGGREGATOR
ACTOR
CLASSIFIER
LOOK-AHEAD
“ El
ev: 30°”
pose image SEN
SOR
+ DELAY 1 step
History AG
GR
EGA
TOR
Look-ahead ℓ2 error
Time 𝑡 − 1 Time 𝑡
action aggregate feature
LOO
K-A
HEA
D sample action
AC
TOR
1
2
3
4
5
12
3
4
predicted class likelihoods
CLA
SSIF
IER5
End-to-end joint training with gradient descent + REINFORCE
35404550556065
1 2 3 4 5
Acc
ura
cy
#views
SUN 360
Passive neural netTransinformation [Schiele98]SeqDP [Denzler03]Transinformation+SeqDPOurs
75
80
85
90
95
1 3 6
#views
ModelNet-10
Passive neural net
ShapeNets [Wu15]
Pairwise [Johns 16]
Ours
25
30
35
40
45
50
1 2 3
#views
GERMS
Passive neural netTransinformation [Schiele98]SeqDP[Denzler03]Transinformation+SeqDPOurs
Our method strongly outperforms representative traditional active recognition approaches on all tasks.
Selected view sequence examples
Top guess Top guess Top guess
GER
MS
Toy
inst
ance
s
SUN
36
0Sc
ene
cat
ego
ries
Quantitative evaluation results
Ablation studies
40
45
50
55
60
65
1 2 3 4 5
#views
SUN 360
40
42
44
46
48
1 2 3
#views
GERMS
Random views(avg)
Random views(recurrent)
Training all 3 components jointly is most critical to performance.
Look-ahead before you leap: End-to-end active recognition by forecasting the effect of motion
Dinesh Jayaraman and Kristen GraumanUT Austin