shape based object class recognition from 3d cad models...shape based object class recognition from...

1
Shape based Object Class Recognition from 3D CAD Models Walter Wohlkinger and Aitor Aldoma and Markus Vincze Fig. 1: The main idea of this paper is to use large collections of 3D CAD models as knowledge representations to explain the world. Hierarchically organized 3D models in large numbers build the basis for the system. Linking to WordNet and additional class specific information such as context, size, appearance and applicable force allows for semantic manipulation of object categories. A synthetic view generation process enables the matching algorithms to be trained on these synthetic models. Given objects captured by a 3D sensor as shown in the topmost row, the shape descriptors enable the matching of the sensed point clouds to synthetic views, approximately resembling the sensed object. I. ABSTRACT The domestic setting with its plethora of categories and their intraclass variety demands great generalization skills from a service robot. The categories are characterized mostly by their shape ranging from low intraclass diversification as in the case of fruits and simple objects like bottles up to high intraclass variety of classes such as liquid containers, furniture, and especially toys. 3D object and object class recognition gained momentum with the arrival of low-cost RGB-D sensors and enables robotic tasks not feasible years ago. With these robots starting to tackle real-word scenarios, fast and reliable object and object class recognition is needed. Especially in robotic manipulation, where object recognition and object classification have to work from all possible viewpoints of an object, data collection for training becomes a bottleneck. Scaling object class recognition to hundreds of classes still requires extensive time and many objects for learning. To overcome the training issue, we introduce a methodology for learning 3D descriptors from synthetic CAD-models for classification of objects at first glance, where classification rates and speed are suited for robotics tasks. Wohlkinger, Aldoma and Vincze are with Vision4Robotics Group, Au- tomation and Control Institute, Vienna University of Technology, Austria [ww,aa,mv] @ acin.tuwien.ac.at We provide this in 3DNet ( 3d-net.org ), a free resource for object class recognition and 6DOF pose estimation from point cloud data. 3DNet provides a large-scale hierarchical CAD-model databases with increasing numbers of classes and difficulty with 10, 50, 100 and 200 object classes together with evaluation datasets that contain thousands of scenes captured with a RGB-D sensor. 3DNet further provides an open-source framework based on the Point Cloud Library (PCL) for testing new descriptors and benchmarking of state-of-the-art descriptors together with pose estimation procedures to enable robotics tasks such as search and grasping. The proposed system only requires a 3D sensor such as a Microsoft Kinect or Asus Xtion and is able to deliver object classification results on a standard consumer notebook with 10 frames per second for scenes with objects on a flat support plane. With additional pose alignment, scale calculation and a scale invariant grasp planner, robotic grasping of categories can be tackled.

Upload: others

Post on 01-Feb-2021

16 views

Category:

Documents


0 download

TRANSCRIPT

  • Shape based Object Class Recognition from 3D CAD Models

    Walter Wohlkinger and Aitor Aldoma and Markus Vincze

    Fig. 1: The main idea of this paper is to use large collections of 3D CAD models as knowledge representations to explain the world.Hierarchically organized 3D models in large numbers build the basis for the system. Linking to WordNet and additional class specificinformation such as context, size, appearance and applicable force allows for semantic manipulation of object categories. A syntheticview generation process enables the matching algorithms to be trained on these synthetic models. Given objects captured by a 3D sensoras shown in the topmost row, the shape descriptors enable the matching of the sensed point clouds to synthetic views, approximatelyresembling the sensed object.

    I. ABSTRACT

    The domestic setting with its plethora of categories andtheir intraclass variety demands great generalization skillsfrom a service robot. The categories are characterized mostlyby their shape ranging from low intraclass diversification asin the case of fruits and simple objects like bottles up tohigh intraclass variety of classes such as liquid containers,furniture, and especially toys. 3D object and object classrecognition gained momentum with the arrival of low-costRGB-D sensors and enables robotic tasks not feasible yearsago. With these robots starting to tackle real-word scenarios,fast and reliable object and object class recognition isneeded. Especially in robotic manipulation, where objectrecognition and object classification have to work from allpossible viewpoints of an object, data collection for trainingbecomes a bottleneck. Scaling object class recognition tohundreds of classes still requires extensive time and manyobjects for learning. To overcome the training issue, weintroduce a methodology for learning 3D descriptors fromsynthetic CAD-models for classification of objects at firstglance, where classification rates and speed are suited forrobotics tasks.

    Wohlkinger, Aldoma and Vincze are with Vision4Robotics Group, Au-tomation and Control Institute, Vienna University of Technology, Austria[ww,aa,mv] @ acin.tuwien.ac.at

    We provide this in 3DNet ( 3d-net.org ), a free resourcefor object class recognition and 6DOF pose estimation frompoint cloud data. 3DNet provides a large-scale hierarchicalCAD-model databases with increasing numbers of classesand difficulty with 10, 50, 100 and 200 object classestogether with evaluation datasets that contain thousandsof scenes captured with a RGB-D sensor. 3DNet furtherprovides an open-source framework based on the PointCloud Library (PCL) for testing new descriptors andbenchmarking of state-of-the-art descriptors together withpose estimation procedures to enable robotics tasks such assearch and grasping.

    The proposed system only requires a 3D sensor such as aMicrosoft Kinect or Asus Xtion and is able to deliver objectclassification results on a standard consumer notebook with10 frames per second for scenes with objects on a flat supportplane. With additional pose alignment, scale calculation anda scale invariant grasp planner, robotic grasping of categoriescan be tackled.