look-ahead before you leap: end-to-end active recognition by...

Active recognition setting Difficulty: unconstrained visual input Opportunity: • Not restricted to a single snapshot. • Strategically acquiring new views. passive 1-view setting active setting mug? bowl? pan? mug pan mug? bowl? pan? mug Perception Perception Action selection Evidence fusion Perception Action selection Evidence fusion JOINT TRAINING Look-ahead FORECASTING THE EFFECTS OF ACTIONS Our idea: Multi-task joint training of components for active recognition + auxiliary internally supervised “look-ahead” task. Components of the active recognition pipeline Our idea [Wilkes 1992, Dickinson 1997, Borotschnig 1998, Schiele 1998, Denzler 2002, Soatto 2009, Ramanathan 2011, Aloimonos 2011, Borotschnig 2011, Wu 2015, Malmir 2015, Johns 2016, …] Prior art: independent, often heuristic components Closely intertwined perception, action and fusion components High-level active RNN system architecture SENSOR AGGREGATOR ACTOR CLASSIFIER LOOK-AHEAD SENSOR AGGREGATOR ACTOR CLASSIFIER LOOK-AHEAD Time Time +1 SUN 360 panoramas GERMS toy manipulation [Xiao 2012] [Malmir 2015] ModelNet-10 3-D models [Wu 2015] SCENE CATEGORIES HAND-HELD OBJECTS 3-D CAD MODELS Towards real-world active recognition Complex real-world categories + easily benchmarkable setups. Component module architectures SENSOR AGGREGATOR ACTOR CLASSIFIER LOOK-AHEAD “ Elev: 30°” pose image SENSOR + DELAY 1 step History AGGREGATOR Look-ahead ℓ 2 error Time −1 Time action aggregate feature LOOK-AHEAD sample action ACTOR 1 2 3 4 5 1 2 3 4 predicted class likelihoods CLASSIFIER 5 End-to-end joint training with gradient descent + REINFORCE 35 40 45 50 55 60 65 1 2 3 4 5 Accuracy #views SUN 360 Passive neural net Transinformation [Schiele98] SeqDP [Denzler03] Transinformation+SeqDP Ours 75 80 85 90 95 1 3 6 #views ModelNet-10 Passive neural net ShapeNets [Wu15] Pairwise [Johns 16] Ours 25 30 35 40 45 50 1 2 3 #views GERMS Passive neural net Transinformation [Schiele98] SeqDP[Denzler03] Transinformation+SeqDP Ours Our method strongly outperforms representative traditional active recognition approaches on all tasks. Selected view sequence examples Top guess Top guess Top guess GERMS Toy instances SUN 360 Scene categories Quantitative evaluation results Ablation studies 40 45 50 55 60 65 1 2 3 4 5 #views SUN 360 40 42 44 46 48 1 2 3 #views GERMS Random views (avg) Random views (recurrent) Training all 3 components jointly is most critical to performance. Look-ahead before you leap: End-to-end active recognition by forecasting the effect of motion Dinesh Jayaraman and Kristen Grauman UT Austin

Upload: others

Post on 19-Aug-2020

3 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Look-ahead before you leap: End-to-end active recognition by …vision.cs.utexas.edu/projects/lookahead_active/eccv16... · 2016. 10. 17. · 2011, Aloimonos 2011, Borotschnig 2011,

Active recognition setting

Difficulty: unconstrained visual input

Opportunity:

• Not restricted to a single snapshot.

• Strategically acquiring new views.

passive 1-view setting active setting

mug?bowl?pan?

mug

pan

mug?bowl?pan?

mug

Perception Perception

Action selection

Evidence fusion

Perception Action selection Evidence fusion

JOIN

T TR

AIN

ING

Look-ahead

FORECASTING THE EFFECTS OF ACTIONS

Our idea: Multi-task joint training of components for active recognition + auxiliary internally supervised “look-ahead” task.

Components of the active recognition pipeline

Our idea

[Wilkes 1992, Dickinson 1997, Borotschnig 1998, Schiele 1998, Denzler 2002, Soatto 2009, Ramanathan2011, Aloimonos 2011, Borotschnig 2011, Wu 2015, Malmir 2015, Johns 2016, …]

Prior art: independent, often heuristic components

Closely intertwined perception, action and fusion components

High-level active RNN system architecture

SENSOR

AGGREGATOR

ACTOR

CLASSIFIER

LOOK-AHEAD

SENSOR

AGGREGATOR

ACTOR

CLASSIFIER