"fast 3d object recognition in real-world environments," a presentation from vangogh...

Copyright © 2014 VanGogh Imaging 1

Ken Lee, CEO

May 29, 2014

Fast 3D Object Recognition

In Real-World Environments

Insert Company Logo on

Slide Master


• Founded in 2007

• Located in McLean, VA

• Mission: “Provide Real-time 3D computer vision technology for

embedded and mobile applications”

• Product: ‘Starry Night’ 3D-CV Middleware

• Operating System: Android and Linux

• 3D Sensor: PrimeSense & Kinect & SoftKinetic

• Processors: ARM & Xilinx Zynq

• Applications

• 3D Printing, Parts Inspection, Robotics

• Security, Automotive, Augmented Reality

• Medical, Gaming

Company Background


Starry Night 3D Middleware


• Busy real-world environment

• Real-time processing

• Tolerant to noise from low-cost scanners

• Efficient

• Fully automated

• Mobile or portable embedded platform (ARM & Xilinx Zynq FPGA)

• Released on Avnet Embedded Software Store: June, 2014.

The ‘Starry Night’ Middleware (Unity Plugin)

Starry Night Video:

https://www.youtube.com/watch?v=Ro1mv007MHo&feature=youtu.be







The ‘Starry Night’ Middleware Blocks


• Reliable — The output is always a fully-formed 3D model with known

feature points despite noisy or partial scans

• Easy to use — Fully automated process

• Powerful — Known data structure for easy analysis and measurement

• Fast — Single step process (Not iterative)

The ‘Starry Night’ Shape-Based Registration

Input Scan (Partial) + Reference Model = Full 3D Model


Object Recognition Algorithm


• Busy scene, object orientation, and occlusion

Challenges — Scene


• Mobile and Embedded Devices

• ARM — A9 or A15, <1G RAM

• Existing libraries were built for laptop/desktop platform

• GPU processing is not always available

• Therefore, we need a very efficient algorithm

Challenges — Platform


• Texture based methods

• Color based depends heavily on lighting or color of the object

• Machine Learning robust but requires training per each object

• Neither method provides Transform (i.e. orientation)

• (3D) methods

• Hough transform and geometric hashing Slow

• Geometric hashing Even slower

• Tensor matching Not good for noisy and sparse scene

• Correspondence based methods using rigid geometric descriptors

• The models must have distinctive feature points which is not

true for most models (i.e. cylinder)

Previous Approaches

Tried


General Concept

Reference

Object

Descriptor

Scene

Compare

distance & normal

Distance and normal of

Random sample points

Match Criteria Fine-Tune

Orientation

Location

Transpose


Block Diagram — Example for One Model


Model Descriptor (Pre-processed)

Sample all point

pairs in the model

that are separated by

the same distance D

Use the surface

normal of the pair to

group them into the

hash tablet

key

(α1,β1,Ω1) P1, P2 P3, P4

(α2,β2,Ω2) P5, P6 P7, P8 P9, P10 P11, P12

(α3,β3,Ω3) P13, P14

Note: In the bear example, D = 5 cm which

resulted in 1000 pairs

Note: The keys are angles derived from the normal of

the points.

alpha(α) = first normal to second point

beta(β) = second normal to first point

omega(Ω) = angle of the plane between two points


Object Recognition of the Model (Real-time)

Grab Scene

Sample point pair w/

distance D using

RANSAC

Generate key using

same hash function

Use key to retrieve

similarly oriented

points in the model &

rough transform

Match criteria to find

the best match

Use ICP to refine

transform

Note: The example scene has around 16K points

Note: We iterated this sampling process 100 times

Note: Entire process can be easily parallelized

Very Important: Multiple models can be

found using a single hash table for

example sampled point pair in the scene


• Result

Implementation

Object Recognition Video:

https://www.youtube.com/watch?v=h7whfei0fTw&feature=youtu.be




Performance


• Reliability

• % False positives — depends on the scene

• Clean scene — <1%

• Noisy scene — 15%

• % negative results (cannot find the object)

• Clean scene — <1%

• Noisy scene — 25% (also takes longer)

• Effect of orientation on success ratio

• Model facing front — > 99%

• Model facing backward — > 99%

• Model facing sideways — 65%

Reliability (w/ bear model)

False positive


• Performance on Cortex A-15 2GHz ARM (on Android mobile)

• Amount of time it takes to find one object

• Single-thread — 4 seconds

• Multi-thread & NEON — 1 second

• Amount of time it takes to find two objects

• Single-thread — 5.2 seconds

• Multi-thread & NEON — 1.4 second

Performance — Mobile


• Select Functions to Be Implemented in Zynq

• FPGA — Matrix operations

• Dual-core ARM — Data management + Floating point

• Entire implementation done in C++ (Xilinx Vivado-HLS)

Hardware Acceleration — FPGA (Xilinx Zynq)


• Note: Currently, only 30% of the computationally intensive functions

are implemented on the FPGA with the rest still running on ARM A9.

Therefore, it should be much faster once we can transfer most of these

to the FPGA.

• Performance on Xilinx Zynq (Cortex A-9 800 MHZ + FPGA)

• Amount of time it takes to find one object

• Zynq 7020 — 6 second

• Zynq 7045 (est.) — <1 second

• No test result for two objects but should scale the same way as for

the ARM.

Performance — Embedded using FPGA


• The object recognition implemented is pretty reliable

• The algorithm does a great job in recognizing multiple models with

minimal penalty

• More improvement is needed for the noisy environment and certain

object orientation

• Additional improvement in the performance is needed

• Algorithm

• Application specific parameters (e.g. size of the model descriptor)

• ARM — NEON

• Algorithm improvement

• Optimize the use of FPGA core

Lesson Learned


Summary


• Key implementation issues

• Model descriptor

• Data structure

• Sampling technique

• Performance

• IMPORTANT

• Both ARM & FPGA provides the scalability

• Therefore

• Real-time object recognition was very difficult but successfully

implemented on both mobile and embedded platforms

• LIVE DEMO AT THE BOOTH!

Summary


• www.vangoghimaging.com

• Android 3D printing: http://www.youtube.com/watch?v=7yCAVCGvvso

• “Challenges and Techniques in Using CPUs and GPUs for Embedded Vision” by

Ken Lee, VanGogh Imaging—http://www.embedded-vision.com/platinum-

members/vangogh-imaging/embedded-vision-

training/videos/pages/september-2012-embedded-vision-summit

• “Using FPGAs to Accelerate Embedded Vision Applications”, Kamalina Srikant,

National Instruments— http://www.embedded-vision.com/platinum-

members/national-instruments/embedded-vision-

training/videos/pages/september-2012-embedded-vision-summit

• “Demonstration of Optical Flow algorithm on an FPGA”—

http://www.embedded-vision.com/platinum-members/bdti/embedded-vision-

training/videos/pages/demonstration-optical-flow-algorithm-fpg

• * Reference: “An Efficient RANSAC for 3D Object Recognition in Noisy and

Occluded Scenes” by Chavdar Papazov and Darius Burschka. Technische

Universitaet Muenchen (TUM), Germany.

Resources

http://www.vangoghimaging.com/

http://www.youtube.com/watch?v=7yCAVCGvvso

http://www.embedded-vision.com/platinum-members/vangogh-imaging/embedded-vision-training/videos/pages/september-2012-embedded-vision-summit



















http://www.embedded-vision.com/platinum-members/national-instruments/embedded-vision-training/videos/pages/september-2012-embedded-vision-summit



















http://www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg


















Thank you

"fast 3d object recognition in real-world environments," a presentation from vangogh...

Technology