"video stabilization using computer vision: techniques for embedded devices," a...

Copyright © 2016 CEVA, Inc. 1

Ben Weiss

May 3, 2016

Video Stabilization Using Computer Vision:

Techniques for Embedded Devices


CEVA — The Leading Licensor of Ultra-low-power

Signal Processing IP’s for Embedded Devices

Imaging &

Vision

Audio, Voice,

Sensing Connectivity Communication

>7 Billion CEVA-powered devices shipped world-wide


Video Stabilization Using Computer Vision


• Rising demand for video cameras on moving platforms:

• Smartphones

• Wearable devices

• Automotive

• Drones

• Captured video usually suffers from:

• Shaky global motion

• Rolling shutter distortion

Why Do We Need Video Stabilization?


• Increases overall user satisfaction with visual quality

• Enables more efficient video compression

• Improves quality and performance of many vision tasks:

• More accurate and robust foreground object tracking

• Removing rolling shutter distortion more robust object recognition



Electronic Image Stabilizer (EIS)

X Unable to reduce motion blur

X Estimates the motion from visual cues

Minimizes low frequency vibrations

Corrects complex and large motion

Embedded into the silicon or

as software with flexible upgrades

Low power (~40mW)

Very low cost

Optical Image Stabilizer (OIS)

Reduces motion blur

Based on the true camera motion

Minimizes high frequency vibrations

X Limited motion range and DoF

X Additional HW components

inside the camera module

X High power ( ~100-150mW )

X High cost ( $2-$4 )

Stabilizer Technologies


• Requires adaptive solutions for various scenarios

• Discriminates between desired and undesired motion

• Requires a complex model to reduce rolling shutter distortion

• Needs to deal with high data rate (up to UHD resolution, 60 fps)

• Needs to keep the power low for embedded applications

Stabilizer Challenges


• Camera motion estimation:

• Feature Detection

• Feature Tracking/Matching

• Motion Model Estimation

Video Stabilization Using Computer Vision

• Camera motion correction:

• Motion Smoothing

• Rolling Shutter Correction

• Frame Warping

Feature

Detection Tracking /

Matching

Motion

Model

Input

Stream

Motion Estimation

Motion

Smoothing

Rolling

Shutter

Output

Stream

Motion Correction

Frame

Warping


How do we select the best feature detector for video stabilizing?

• Variety of detectors: Harris Corner, Shi–Tomasi, FAST, DoG…

• Required detector properties:

• Selectivity – Response only on corners and not on edges

• Critical for robust feature tracking/matching

• Repeatability – Consistent response for different frames

• Critical for feature matching

• Sensitivity – Improves frame spatial coverage

• Critical for accurate global motion estimation

• Invariance under translation, rotation, and scaling

• Intelligent selection of features is a key factor

Feature Detection

Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter


Feature tracking – optical flow (KLT)

• Pros: Accurate, local, and continuous response

• Cons: Prone to failure on large motion and

illumination changes

• Scale pyramid expands motion range:

• Performance can be optimized by fusing

feature detection and pyramid creation

• Tracking trade-off:

• Track length – Prefer strong features that last longer

• Frame coverage – Prefer weak but well-distributed features

Feature Tracking / Matching

Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter Frame i Frame i+1


Feature matching – binary descriptors (BRIEF, FREAK)

• Pros: Handles arbitrarily large motions

Invariance under illumination and geometric changes

• Cons: Prone to failure on repetitive textures

• Accelerating methods:

• Filter candidates for matching by:

• Location

• Scale

• Orientation

• Use inertial sensors data as a coarse prediction for location


Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter


Efficient feature tracking/matching for embedded devices

• Exploits data parallelism using SIMD operations:

• Scatter-gather operations

• Parallel convolution

• Parallel arithmetic operations

• Uses fast local memories

• Uses fixed-point precision


Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter


• Motion models:

• Translation (2 DoF)

• Translation in X,Y

• Similarity (4 DoF)


• Uniform scale

• Rotation

• Homography (8 DoF)


• Scale and rotation

• Skew and perspective

Motion Model Estimation

Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter

Higher DoF More data required

Higher CPU load

The selected model


• The camera motion model is estimated by RANSAC algorithm:

• RANdom SAmple Consensus - An iterative method to estimate

parameters of a model from a set of observed data which contains outliers

Motion Model Estimation

Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter

• Assumptions:

• Global motion (background)

can be estimated from the largest

set of inliers

• Rolling shutter distortion

can be neglected at this stage

• Conclusions:

• Features must be uniformly

distributed for accurate estimation


• Accumulates frame-to-frame camera motion to

create the estimated camera path

• Uses Kalman filters to:

• Smooth the camera path

• Filter out the high frequency jitter

• Filters each motion component separately:

• Translation X

• Translation Y

• Scale

• Rotation

Motion Smoothing

Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter


• Rolling shutter scan (CMOS sensors)

• Frame is captured line by line

• Camera or scene motion cause a

distorted image

• Rolling shutter distortions:

• Low frequency motion Stretch/Skew

• High frequency motion Wobble

Rolling Shutter Correction

Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter • Our method for distortion correction:

1. Split the frame into horizontal strips

2. Estimate homography model for each strip

3. Ensure boundary conditions by spatial interpolation


• Stabilization process requires cropping a part of the image

• The ROI can be selected dynamically

• Larger motion leads to more aggressive cropping (typically ~10%)

• The ROI must be inside the input frame for best visual quality

• Warping models:

Frame Warping

Frame

Warping

Motion

Smoothing

Feature

Detection

Tracking /

Matching

Motion

Model

Rolling

Shutter


• A rich test set is necessary to cover different motion types

• RANSAC is a very powerful technique, but has shortcomings

• Spatial image coverage at feature detection and tracking/matching stages is very

important for accurate motion estimation

• Overall, feature matching is superior to feature tracking

• Fixed-point accuracy is sufficient for the feature tracking stage

• Floating-point accuracy is required for motion model estimation and smoothing

• Smoothing algorithm complexity is low, but the quality effect is significant

• A homography motion model is required for best visual results

• Overall performance is dominated by frame warping

Lessons Learned


Video Stabilization Side-by-Side Example

Video can be found here: www.ceva-dsp.com/DVS

http://www.ceva-dsp.com/DVS




• Digital Video Stabilization: Smooth Footage without Expensive

Mechanics

• www.embedded-vision.com/industry-analysis/technical-

articles/digital-video-stabilization-smooth-footage-without-expensive-

me

• Digital Video Stabilizer (DVS) Software - CEVA Website

• www.ceva-dsp.com/Digital-Video-Stabilizer

• CEVA demos @EVA Summit – come visit us there!

Resources

http://www.embedded-vision.com/industry-analysis/technical-articles/digital-video-stabilization-smooth-footage-without-expensive-me






















http://www.ceva-dsp.com/Digital-Video-Stabilizer








Backup Material


More than 300 licensees to date

>7 Billion CEVA-powered devices

shipped worldwide to date

100 licensees of Wi-Fi & Bluetooth

IP – and more than 1 billion

chips shipped

3X the market share in DSP over

any other DSP IP vendor

1 in 3 handsets worldwide are

powered by CEVA DSP

5 billion DSP cores in audio/voice

devices shipped to date

>20 licensees for imaging and

vision – shipping for first time

in 2016

CEVA — The Leading Licensor of Ultra-low-power

Signal Processing IP’s for Embedded Devices


• Face Detection & Recognition

• Universal Object Recognition

• Pedestrian Detection

• ADAS Algorithms (FCW, LDW)

• 3D Depth Map Creation

CPU-DSP Link – Communication Layer

• Digital Video Stabilizer (DVS)

• Super-Resolution (SR)

Hardware

Layer

Software

Layer

App Dev.

Kit (ADK)

Host CV / OpenVX API

SW

Toolset

Hardware Development

Kit

Partner Software Products

CEVA-XM4 DSP Core

Auto system handle

CEVA Software Products

CEVA-CV Libraries

CEVA CNN Framework (CDNN) Android Framework (AMF) Provides OEM

differentiation CPU

offload

Source code

provided

RTOS

CEVA-XM4 Imaging & Vision IP Platform

"video stabilization using computer vision: techniques for embedded devices," a...

Technology