Download - ICRA 2015 interactive presentation
PAUL STURGESS AND SUNANDO SENGUPTAOXFORD BROOKES UNIVERSITY
ICRA 2015
Semantic Octree: Unifying Recognition, Reconstruction and
Representation via an Octree Constrained Higher Order MRF
*Joint First Author, {paul.sturgess.cv,sunando.sengupta}@gmail.com
Semantic Octree
Recognition Structured Prediction widely adopted in vision: AHRF[1]
Efficiency of the outputted structure is not the focus.Reconstruction
Octree widely adopted in robotics: Octomap[2]
Incorporating high level semantic information is not the focusUnifying Representation
Complementary to recognition and reconstruction. Efficient for further manipulations of underlying data.
Combine Octomap and AHRF to get best of both
2
[1] P. Kohli et at. Robust Higher Order Potentials for Enforcing Label Consistency[2] O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.
Recognition3
● AHRF - Associative Higher-order Random Fields Framework.
● Multi-resolution approach to Semantic image segmentation.● Efficient and bounded inference with alpha-expansion.
Reconstruction4
The main elements of a occupancy based scene reconstruction are: Occupied: Objects present in the world, Free: required for collision avoidance, path planning. Unmapped: unknown areas in the scene need to be avoided.
Representation5
• Efficient access to, and manipulation of, 3D object models are at the heart of robotics. o Point clouds, Mesh---cannot map free and unknown area.o Stixels/Height maps/2.5D---one height value in a 2D grid and free
area not accurately mapped.o Fixed sized grid of voxels---Voxels not indexed which makes it �
inefficient• Octree based volumetric representationo Represents accurately 3d space, efficient indexing of volume
Image courtesy: O Armin et. al., OctoMap: An efficient probabilistic 3D mapping framework based on octrees.
Semantic Octree - framework6
Input stereo images
Chap 6, Sec 6.3
Semantic Octree - framework7
Generate point clouds and class hypothesis for every pixel
Chap 6, Sec 6.3
Semantic Octree - framework8
Fuse into an octree through estimated camera
Octree – each volume subdivided in 8 sub-volumesLeaf- nodes (xi) are the smallest sized voxelsAny internal node (xc) gives a natural grouping of 3D space
Chap 6, Sec 6.3
Semantic Octree - framework9
Perform inference over 3D voxels to give labelled scene.
Chap 6, Sec 6.3
CRF graph on Octree voxels10
Octree divides the space into subvolumes indexed through tree with nodes τint : Internal nodes in the tree (xc) τleaf : leaf level voxels (xi)
Random variable for every leaf voxel Every internal node is associated with a set of leaf voxels
resulting in a cliqueLabel set defined asFinal energy :
Octree Volume update All voxels initially set unknown and occupancy probability P(xi) = 0.5 and
log odds
For each 3D point (obtained from stereo pairs), voxels’ log odds updated in a ray casting manner
Log odds are updated for all 3D points for every stereo pairs Final occupancy probability obtained as
Unary score for leaf voxels11
Chap 6, Sec 6.3.1
Each occupied voxel xi is associated with a set of 3D ptsThe corresponding image pixels denoted asPixel scores combined togetherGiven the initial occupancy P(xi), the unary is given as:
Thus, for every initially estimated occupied voxels have low cost for free label and vice verca
Unary score for leaf voxels12
Chap 6, Sec 6.3.1
Robust PN potential applied over hierarchical groupings of voxels Penalise label inconsistency within the grouping of voxels
Takes the form
Maximum cost truncated to ϒmax
Grouping of voxels correspond to internals nodes in the octree
Hierarchical tree potential13
Chap 6, Sec 6.3.2
Experiments14
Octree defined of 16 levels
Smallest resolution of voxels = (8x8x8)cm3
Maximum mapped volume (216 x 8 )3cm 5.24km3
Hierarchical grouping of voxels corresponding to internal nodes 13-15 considered
Results15
Higherarchial grouping while inference vs leaf level voxel labelling (much sparser)
Chap 6, Sec 6.4
Quantitative evaluation : Performed by projecting into image domain
Observations Small objects tend to get decimated due to octree quantization while mesh
based representation better in representing surface.
Results16
[1] Sengupta et.al. “Urban 3d semantic modelling using stereo vision,” in ICRA, 2013[2] Valentin, et. al , “Mesh based semantic modelling for indoor and outdoor scenes,” in CVPR, 2013
[2][1][1]
[2][1][1]
Occupancy mapping17
Grouping of voxels hierarchically increases the occupied volume reducing the sparsity
Conclusion18
● Proposed a method which performs reconstruction in an efficient representation aided by semantics of the scene
● Combined AHRF and Octomap to get best of both
● Some Future Applications○ Scene interaction and manipulation.○ Collision detection, with known object types.○ Path Planning with known affordances.