3d sensing and mapping

32
3D Sensing and Mapping B659: Principles of Intelligent Robot Motion Spring 2013 Kris Hauser

Upload: krysta

Post on 24-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

3D Sensing and Mapping. B659: Principles of Intelligent Robot Motion Spring 2013 Kris Hauser. Agenda. A high-level overview of visual sensors and perception algorithms Core concepts Camera / p rojective geometry Point clouds Occupancy grids Iterative closest points algorithm. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 3D Sensing and Mapping

3D Sensing and MappingB659: Principles of Intelligent Robot MotionSpring 2013Kris Hauser

Page 2: 3D Sensing and Mapping

Agenda• A high-level overview of visual sensors and perception

algorithms• Core concepts • Camera / projective geometry• Point clouds• Occupancy grids• Iterative closest points algorithm

Page 3: 3D Sensing and Mapping

• Proprioceptive: sense one’s own body• Motor encoders (absolute or relative)• Contact switches (joint limits)

• Inertial: sense accelerations of a link• Accelerometers• Gyroscopes• Inertial Measurement Units (IMUs)

• Visual: sense 3D scene with reflected light • RGB: cameras (monocular, stereo)• Depth: lasers, radar, time-of-flight, stereo+projection• Infrared, etc

• Tactile: sense forces• Contact switches• Force sensors• Pressure sensors

• Other• Motor current feedback: sense effort• GPS• Sonar

Page 4: 3D Sensing and Mapping

Sensing vs Perception• Sensing: acquisition of signals from hardware• Perception: processing of “raw” signals into “meaningful”

representations

• Example:• Reading pixels from a camera is sensing. • Declaring “it’s a rounded shape with skin color”, “it’s a face”, or

“it’s a smiling face” are different levels of perception.• Especially at lower levels of perception like signal processing,

the line is blurry, and the often the processed results can be essentially considered “sensed”.

Digits

Page 5: 3D Sensing and Mapping

3D Perception Topics• Sensors: visible-light cameras, depth sensors, laser sensors• (Some) perception tasks:• Stereo reconstruction• Object recognition• 3D mapping• Object pose recognition

• Key issues:• How to represent and optimize camera transforms?• How to fit models in the presence of noise?• How to represent large 3D models?

Page 6: 3D Sensing and Mapping

Visual Sensors• Visible-light cameras: cheap, low power,

high resolution, high frame rates• Data: 2D field of RGB pixels• Stereo cameras

• Depth field sensors• Two major types: infrared pattern

projection (Kinect, ASUS, PrimeSense) and time-of-flight

• Data: 2D field of depth values (SwissRanger)

• Sweeping laser sensors• Data: 1D field of depth values• Hokuyo, SICK, Velodyne• Can be mounted on a tilt/spin mount to

get 3D field of view

Bumblebee stereo camera

ASUS Xtion depth sensor

Hokuyo laser sensor

Page 7: 3D Sensing and Mapping

Sensors vary in strengths / weaknesses• Velodyne (DARPA grand challenge)• 1.3 million readings/s• $75k price tag

Page 8: 3D Sensing and Mapping

Image formation• Light bounces off an object, passes through a lens and lands

on a CCD pixel on the image plane• Depth of focus• Illumination and aperture

• Color: accomplished through useof filters, e.g. Bayer filter• Each channel’s in-between pixels

are interpolated

Page 9: 3D Sensing and Mapping

Idealized projective geometry• Let:• Zim: distance from image plane to focal point along the depth axis• (X,Y,Z): point in 3D space relative to focal point, Z > 0

• Then, image space point is:• Xc = Zim X / Z• Yc = Zim Y / Z

z

x Image plane(X,Z)

…Which get scaled and offset to getpixel coordinates

(Xc,Zim)Focal point

Page 10: 3D Sensing and Mapping

Issues with real sensors• Motion blur• Distortion caused by lenses• Non-square pixels• Exposure• Noise• Film grain• Salt-and-pepper noise• Shot noise

Distortion

Motion blur

Exposure

Page 11: 3D Sensing and Mapping

Calibration• Determine camera’s intrinsic

parameters• Focal length• Field of view• Pixel dimensions• Radial distortion

• That determine the mapping from image pixels to an idealized pinhole camera• Rectification

Page 12: 3D Sensing and Mapping

Stereo vision processing• Dense reconstruction• Given two rectified images, find binocular disparity at each pixel• Take a small image patch around each pixel in the left image,

search for the best horizontal shifted copy in the right image• What size patch? What search size? What matching criterion?• Works best for highly textured scenes

Page 13: 3D Sensing and Mapping

Point Clouds

• Unordered list of 3D points P={p1,…,pn}• Each point optionally annotated by:• Color (RGB)• Sensor reading ID# (why?)• Estimated surface normal (nx,ny,nz)

• No information about objects, occlusions, topology

• Point Cloud Library (PCL)• http://pointclouds.org

Page 14: 3D Sensing and Mapping

3D Mapping• Each frame of a depth sensor gives a narrow snapshot of the

world geometry from a given position• 3D mapping is the process of stitching multiple views into a

global model

Page 15: 3D Sensing and Mapping

Three scenarios:

• Consider two raw point clouds P1 and P2 from cameras with transformations T1 and T2.

• Goal: build point cloud P in frame T1 (assume identity)• Case 1: relative transformations known• Simple union P = P1 (T2

-1 P2)

• Case 2: small transformation• Pose registration problem• Vast majority of points correspond between scenes

• Case 3: large transformation• Significant fraction of points do not correspond, lighting

differences, more occlusions• Pose registration must be more robust to outliers

Page 16: 3D Sensing and Mapping

Case 2: Small transformations• Visual odometry: estimate relative motion of subsequent

frames using optical flow• Define feature points in P1 (e.g., corner detector)• Estimate a transformation of an image patch around feature that

best matches P2 (defines optical flow field)• Transformation: translation, rotation, scale

• Fit T2 to match these feature transforms

Page 17: 3D Sensing and Mapping

Case 3: Large transformations• Iterative closest point algorithm• Input: initial guess for T2

• Repeat until convergence:• Find nearest neighbor pairings between P1 and T2P2

• Select those pairs that fall below some distance threshold (outlier rejection)

• Assign an error metric and optimize T2 to minimize this metric

Page 18: 3D Sensing and Mapping

Case 3: Large transformations• Iterative closest point algorithm• Input: initial guess for T2

• Repeat until convergence:• Find nearest neighbor pairings between P1 and T2P2

• Select those pairs that fail to satisfy some distance criteria (outlier rejection)

• Assign an error metric and optimize T2 to minimize this metric

What metric?

What criteria?

How to minimize?

Page 19: 3D Sensing and Mapping

What metrics for matching?• Position• Surface normal• Color

• Nearest neighbor methods• Fast data structures, e.g. K-D trees• For large scans usually want a constant sized subsample

• Projection-based methods• Render scene from perspective of T1 to determine a match• Very fast (used in Kinect Fusion algorithm)• Only uses position information

Page 20: 3D Sensing and Mapping

What criteria for outlier rejection?• Distance too large (e.g., top X%)• Inconsistencies with neighboring pairs• On boundary of scan

Page 21: 3D Sensing and Mapping

How to optimize?

• Want to find rotation R, translation t of T2 to minimize some error function

• Sum of squared point-to-point differences

• Closed-form solution (SVD)• Very fast per step

• Sum of squared point-to-plane differences

• Must use numerical methods• Deal with rotation variable• Tends to lead ICP to converge with fewer iterations

Page 22: 3D Sensing and Mapping

Other applications of ICP• Fitting 3D triangulated models to point clouds for object

recognition / pose estimation

Page 23: 3D Sensing and Mapping

Stitching multiple scans• In its most basic form, multi-view 3D mapping is simply a

repetition of the two-camera case• But two major issues:• Drift and “closing the loop”• Point clouds become massive after many scans This time

Page 24: 3D Sensing and Mapping

Point cloud growth problem• With N points, at F frames/sec, and T seconds of run time,

NFT points are gathered• Kinect: N=307200, F=30, T=60 => 552,960,000 points• With RGB in 4 bytes, XYZ in 12 bytes => 8 GB / min

• Solutions:• Forget earlier scans (short term memory)• Build persistent, “collapsed” representation of environment

geometry• Polygon meshes• Occupancy grids• Key issue: how to estimate with low # of points and update later?

Page 25: 3D Sensing and Mapping

Occupancy grids• Store a grid G with a fixed minimum resolution• Mark which cells (voxels) are occupied by a point• Representation size

is independent of T

Two options for updating on a new scan1. Compute ICP to align

current scan to prior scan, then add points to G

2. Modify ICP to work directly with the representation G

Page 26: 3D Sensing and Mapping

Probabilistic Occupancy Grids with Ray Casting• Scans are noisy, so simply adding points is likely to

overestimate occupied cells• Ray casting approaches:• Each cell has a probability of being free/occupied/unseen• Each scan defines a line segment that passes through free space

and ends in an occupied cell (or near one)• Walk along the segment, increasing P(free(c)) of each

encountered cell c, and finally increase P(occupied(c)) for the terminal cell c

Page 27: 3D Sensing and Mapping

Probabilistic Occupancy Grids with Ray Casting• Scans are noisy, so simply adding points is likely to

overestimate occupied cells• Ray casting approaches:• Each cell has a probability of being free/occupied/unseen• Each scan defines a line segment that passes through free space

and ends in an occupied cell (or near one)• Walk along the segment, increasing P(free(c)) of each

encountered cell c, and finally increase P(occupied(c)) for the terminal cell c

Page 28: 3D Sensing and Mapping

Compact geometry representations within a cell• On-line averaging• On-line least squares estimation of fitting plane

Page 29: 3D Sensing and Mapping

Handling large 3D grids• Problem: tabular 3D grid storage

increases with O(N3)• 10243 is 1Gb

• Solutions• Store hash table only of occupied

cells• Octree data structure

• OctoMap Library (http://octomap.sourceforge.net)

Page 30: 3D Sensing and Mapping

Dynamic Environments• Real environments have people, animals, objects that move

around• Two options:• Map static parts by assuming dynamic objects will average out as

noise over time (probabilistic occupancy grids)• Segment (and possibly model) dynamic objects

Page 31: 3D Sensing and Mapping

Related topics • Sensor fusion• Object segmentation and recognition• Simultaneous Localization and Mapping (SLAM)• Next-best-view planning

Page 32: 3D Sensing and Mapping

Next time• Kalman filtering• Welch and Bishop (2001)• Principles Ch. 8• Zeeshan