object detection training: an online learning pipeline for ......design and implementation of an...
TRANSCRIPT
Object Detection Training: An Online Learning Pipeline for Humanoid Robots
Elisa Maiettini and Giulia Pasquale
MUNICH 9-11 OCT 2018
Joint work with:
Lorenzo Natale, Lorenzo Rosasco
Talk Overview
o Computer Vision & Robotic Scenario
o Object Detection’s Challenges
o Our Online Detection Approach
o Experimental Results
o Implementation Details
Stunning performance
Huge amount of data
Long train time
Computationally demanding
Computer Vision at The Edge
…VIDEO HERE…
Full video https://www.youtube.com/watch?v=s8Ui_kV9dhw DeepLab: https://arxiv.org/abs/1802.02611 , Mask E-CNN: https://arxiv.org/abs/1703.06870 ,
YOLOv2 and YOLO v3: https://pjreddie.com/darknet/yolo/
Multiple sources of information (sensors, context…)
Specific data required
Need to adapt quickly
Limited computation
Real World Robotic Scenario
Our Previous Steps
Are we done with Object Recognition? The R1 perspective.
Giulia Pasquale, GTC 2017, San Jose, CA
[http://on-demand.gputechconf.com/gtc/2017/video/s7295-giulia-pasquale-are-
we-done-with-object-recognition-the-r1-robot-perspective.PNG.mp4 ]
Natural, Interactive training of service robots to detect
novel objects, Elisa Maiettini and Giulia Pasquale, GTC 2018,
Munich
[https://drive.google.com/file/d/0B596cb8D9K9kcGE4czA0NnNUTmM/preview]
Talk Overview
o Computer Vision & Robotic Scenario
o Object Detection’s Challenges
o Our Online Detection Approach
o Experimental Results
o Implementation Details
Object Detection: a Computational Challenge
1) Localization 2) Classification
Sliding window / gridSelective Search, Uijlings, IJCV, 2013
Region Proposal Network, S. Ren et al., NIPS, 2015 …
SVMs,RLS,
FC Layers,…
o Binary classification problemo Enormous dataseto Strongly unbalancedo Samples from the Majority class are:
• Redundant• Tons of easy samples
Object Detection: a Computational Challenge
Detector
Where?
What?
Region Based Approaches 1) R-CNN, Girshick R et al., 2014
2) Faster R-CNN, Shaoqing R et al., 2015
Standard Offline Pipeline
Apple AppleApple
FeatureExtractor
RegionProposal
Data acquisition and annotation
Train detection model
Hours/days of train…
Detection Model
Standard Offline Pipeline
Annotation
RGB Image
Data acquisition and annotation
Train detection model
Hours/days of train…
Detection Model
Standard Offline Pipeline
Annotation
RGB Image
• Human annotation
• Long train time
• Slow adaptation
Talk Overview
o Computer Vision & Robotic Scenario
o Object Detection’s Challenges
o Our Online Detection Approach
o Experimental Results
o Implementation Details
Our Online Detection Approach at a Glance
Data acquisition and
automatic annotation[3]
Train detection model
Few secondsof train[4]…
Detection Model
[3] E. Maiettini et al., Humanoids, 2017 [4] E. Maiettini et al., IROS, 2018
Annotation
RGB Image
RGB Image
Segmentation
Output
[5] Pasquale et al. IROS 2016
iCubWorld[5]
https://robotology.github.io/iCubWorld
Automatic Data Acquisition
Bounding boxes
Labels
Faster R-CNN architecture
Featureextractor
Feature extraction module Detection module
Proposed Learning Pipeline
ConvolutionalLayers
Region ProposalNetwork
Fastclassifier
Bboxrefinement
FALKON+
Minibootstrap
RegularizedLeast
Squares
R-CNN approach
o Subsampling + splitting negatives
o First model train
FOR EACH batch i
• Select hard negatives
• Train ith model
• Prune easy negatives
Minibootstrap
o Subsampling + splitting negatives
o First model train
FOR EACH batch i
• Select hard negatives
• Train ith model
• Prune easy negatives
Minibootstrap
Nchosen_1 M1Train(P )
o Subsampling + splitting negatives
o First model train
FOR EACH batch i
• Select hard negatives
• Train ith model
• Prune easy negatives
Minibootstrap
Mi-1TEST ON Bi Bi_hard
M1Train(P )Nchosen_1
o Subsampling + splitting negatives
o First model train
FOR EACH batch i
• Select hard negatives
• Train ith model
• Prune easy negatives
Minibootstrap
Mi-1TEST ON Bi Bi_hard
P)Nchosen_i-1Train(Bi_hard Mi
M1Train(P )Nchosen_1
o Subsampling + splitting negatives
o First model train
FOR EACH batch i
• Select hard negatives
• Train ith model
• Prune easy negatives
Minibootstrap
Mi-1TEST ON Bi Bi_hard
Nchosen_i
TEST ON (Bi Nchosen_i-1)
Mi
P)Nchosen_i-1Train(Bi_hard Mi
M1Train(P )Nchosen_1
o Subsampling + splitting negatives
o First model train
FOR EACH batch i
• Select hard negatives
• Train ith model
• Prune easy negatives
Minibootstrap
Mi-1TEST ON Bi Bi_hard
P)Nchosen_i-1Train(Bi_hard Mi
And now repeat!
Nchosen_i
TEST ON (Bi Nchosen_i-1)
Mi
M1Train(P )Nchosen_1
FALKON: An Optimal Large Scale Kernel Method
• Kernel method efficient for Large Scale datasets;
• Accurate classifier (statistical bounds mathematically proved in [6]);
• Stochastic data subsampling obtained applying iterative solvers, preconditioning and Nÿstrom method.
[6] Rudi A. et al, NIPS, 2017
Talk Overview
o Computer Vision & Robotic Scenario
o Object Detection’s Challenges
o Our Online Detection Approach
o Experimental Results
o Implementation Details
Experimental setup
PRE-TRAIN TASK TARGET TASK
Feature extraction module Detection module
Fastclassifier
Bboxrefinement
FALKON+
Minibootstrap
RegularizedLeast
Squares
BACKBONE NETWORK:ZF[8]
Resnet50[9]
Resnet101[9]
DATASETPascal VOC dataset[6]
iCubWorld-Transformations[7]
[6] http://host.robots.ox.ac.uk/pascal/VOC/[7] https://robotology.github.io/iCubWorld/
[8] Visualizing and understandingconvolutional networks. CoRR, D. Zeiler et al.[9] Deep Residual Learning for Image Recognition, K. He et al.
Experiments on Pascal Voc Dataset
mAP Train Time
Faster R-CNN 74,3 3h 15m
FALKON + Fullbootstrap 75,1 55m
FALKON + Minibootstrap (10x2000) 70,4 1m 40s
Pre-train task: Voc 2007 + Voc 2012 Target task: Voc 2007 + Voc 2012
Feature extraction module Detection module
Fastclassifier
Bboxrefinement
FALKON+
Minibootstrap
RegularizedLeast
Squares
…
Resnet101
Detection module
Fastclassifier
Bboxrefinement
FALKON+
Minibootstrap
RegularizedLeast
Squares
Pre-train task: 100 objects from iCubWorld Target task: 30 objects from iCubWorld
Feature extraction module
Experiments on iCubWorld: Setup
… …
Resnet50
mAP Train Time
Faster R-CNN last layers 51,7 4h
FALKON + Minibootstrap (10x2000) 51,7 33s
Experiments on iCubWorld: Some Results
Experiments on iCubWorld: More Results
Talk Overview
o Computer Vision & Robotic Scenario
o Object Detection’s Challenges
o Our Online Detection Approach
o Experimental Results
o Implementation Details
R1, your Personal Humanoid RobotSensors
IMUIntel RealSense
2x RGB camerasSensorized skin
2x LIDAR
Motion
2 wheelsTorso elongation: 20 cmArms elongation: 13 cm
Li-ion battery: 3 hours
Software
CAFFEC++
PythonMATLAB
+
Computation
2x NVIDIA Jetson TX2
Intel i7
Fully Autonomous
Platform!
For Extra Power
Computation!
NVIDIA GeForce 1080 Ti
Application pipeline
Feature/Region Extractor
Automatic GT Extractor
DetectorVisualizerState
Machine
Gaze Ctrl
Speech Recognizer
Mug
GTboxes
Feature for each region
Predictions
Commands
Predictions
Verbal commands
Actions
RGBImage
DepthImage
GTboxes
Gazecommands
Speechcommands
…VIDEO HERE…
Deploying Our Pipeline on R1…
Conclusions
Design and implementation of an online Object Detection learning pipeline, that can be trained in few seconds
Deployment on R1 humanoid robot, thanks to NVIDIA acceleration
Future works Further exploit context information
Design fully autonomous learning pipeline