lecture 15 ryuzo okada - vision processors for embedded computer vision
TRANSCRIPT
© 2014 Toshiba Corporation
Vision Processors for Embedded Computer Vision
Ryuzo Okada Corporate R&D Center, Toshiba Corporation
July 18, 2014
© 2014 Toshiba Corporation 2
• Cameras become ubiquitous
Surveillance Automobile Smartphone
Embedded computer vision
SUBARU Eyesight http://www.subaru.jp/about/technology/story/eyesight/eyesight01.html
© 2014 Toshiba Corporation 3
• High performance for vision processing
– To provide valuable functions to users
– Real-time processing
• Low power consumption
– To reduce running cost
– Max. few watts for fan-less cooling
• Robustness
– Long term operation: 7-10 years
– Outdoor: -40℃ - 85℃
– Shock-proof
General purpose CPU is not feasible
for embedded computer vision processing
Embedded computer vision: Requirements
High performance/W (e.g. GOPS/W)
© 2014 Toshiba Corporation 4
• Types of vision processors
• Vision processors for automobiles
– Toshiba’s image recognition LSI, TMPV75 series
• Cloud computing and vision processors
– Surveillance camera
• Future direction and summary
Contents
© 2014 Toshiba Corporation 5
Logic-circuit-embedded image sensor
• Silicon retina [Mead89]
– simulated the neural layers in the retina using analog circuits
– Early vision processing, e.g. smoothing
• Optical Neurochip [Nitta92]
– achieved a neural NW by optical circuits
– Alphabet recognition
Types of vision processors: (1) vision chip
© 2014 Toshiba Corporation 6
• Programmable Artificial Retina [Bernard93] Near Sensor Image Processing [Astrom96]
Sensory Processing Element [Ishii96]
– consist of a photodiode (PD) with a digital processing element (PE)
– Massively parallel processing (pixel parallel) realized 1 ms visual servo control
• IVP MAPP [Johansson03] Column-parallel vision chip [Nakabo02]
– PE is assigned for each column of PD array
Types of vision processors: (1) Vision chip
Vision chip can provide simple functions, e.g. smoothing, motion estimation.
© 2014 Toshiba Corporation 7
Types of vision processors: (2) Discrete
Type
Flexibility
Special purpose General purpose
High Low
Eff
icie
ncy
(e.g
. G
OP
S/W
)
ATOM
Tegra K1
EyeQ2
TMPV75
DaVinci
SH7766 Intel
NVIDIA
Texas Instruments
TOSHIBA
ST Microelectrionics
RENESAS
Tablet Mobile PC
Smart phone
Network camera
Automobile
© 2014 Toshiba Corporation 8
Architecture comparison TOSHIBA
TMPV7506XBG TOSHIBA
TMPV7528XBG ST Micro EyeQ2
RENESAS SH7766
NVIDIA Tegra K1
TI DaVinci TMS320DM814x
CPU
Media Processor
or DSP
SIMD Engine
Accelerator
MPE 266MHz
Control Processor
MPE
MPE
MPE
Affine Transform Accelerator
Filter Accelerator 180MHz 64 PEs
Filter Accelerator 64 PEs
Histogram Accelerator
HOG Accelerator
Matching Accelerator
MeP 266MHz
SIMD Engine 133MHz
PE
1
PE
2
PE
3
PE
64
…
DSP (C674x+) 750MHz
Resizer Accelerator (x 1/16 to 8)
SH4A 534MHz
ARM Cortex-A8
1GHz
IMP-X2 266MHz (IntegralImage etc.)
IMR-X 1ch (Affine)
IMR-LSX 4ch (Affine)
MIPS34K 332MHz
MIPS34K 332MHz
VMP
VMP
VMP
Classification
Preprocess Window
Filter (Integral Image)
Disparity Finder
Tracker
MPE 150MHz
MPE
MPE
Affine Transform Accelerator
Trend: Heterogeneous multicore architecture
MPE 266MHz
MPE
MPE
MPE
Affine Transform Accelerator
Filter Accelerator 180MHz 64 PEs
Filter Accelerator 64 PEs
Histogram Accelerator
HOG Accelerator
Matching Accelerator
MeP 266MHz
ARM Cortex-A9 300MHz
ARM Cortex-A9
ARM Cortex-A15
2.3GHz
ARM Cortex-A15
2.3GHz
ARM Cortex-A15
2.3GHz
ARM Cortex-A15
2.3GHz
CUDA 192 cores
ISP
ISP
[Tanabe12], [TMPV]
[TMPV] [EyeQ] [SH] [Tegra] [DaVinci]
© 2014 Toshiba Corporation 9
• Types of vision processors
• Vision processors for automobiles
– Toshiba’s image recognition LSI, TMPV75 series
• Cloud computing and vision processors
– Surveillance camera
• Future direction and summary
Contents
© 2014 Toshiba Corporation 10
TMPV7506XBG Block Diagram
Speaker I2S
RGB888 / 565
LED (7-seg) 8
DDR2 DRAM
NOR Flash
DDR2-533 SDRAM
16-bit x 2
NOR Flash
CAN
UART / SPI / I2C
camera
camera
camera
camera
Video
Input
I/F
Video Output
I/F
Media Processing Engine (MPE)
#1 #2 #3 #4
Accelerators
Affine Transform
Filter 1 Histogram
32-bit RISC CPU Main Memory
Controller On-chip 2MB RAM
WVGA LCD Panel
Peripherals
CAN
GPIO
Serial I/F
Timer
PCM I/F
MCU I/F CAN MCU
MediaLB/ MOST
CAN
GPIO
Input Capture/Output Compare /PWM
TMPV7506XBG
RGB888 / 666 / 565 YCbCr422 BT.656 Y8 – Y12 8-12bit Bayer
Other ECU
PCI Express
16-bit 2CS
Matching Filter 2 HOG
Multi-core Architecture
for Multiple (up to 4)
Applications
Pedestrians
Lanes & Vehicles
Accelerators for high
performance image processing
4 camera
inputs RGB888 / 565
Traffic Signs
© 2014 Toshiba Corporation 11
Image Processing Accelerators
DDR2
SDRAM
Controller
DDR2
SDRAM
Controller
NOR Flash
/SRAM
Controller
NOR Flash
/SRAM
Controller
Working
RAM
System
ROM
Serial
I/F
Serial
I/F
Video
Input
I/F
Video
Output
I/F
PCI
Express
MeP
DataCache
Inst.Cache
DMAC
DataRAM
Inst.RAM
MeP
DataCache
Inst.Cache
DMAC
DataRAM
Inst.RAM
DataCacheDataCache
Inst.CacheInst.Cache
DMACDMAC
DataRAMDataRAM
Inst.RAMInst.RAM
L2 Cache
MPE 0 MPE 1 MPE 2 MPE 3
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
L2 Cache
MPE 0 MPE 1 MPE 2 MPE 3
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
MPE 0 MPE 1 MPE 2 MPE 3
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCacheDataCache
Inst.CacheInst.Cache
DMACDMAC
IVC2IVC2
DataRAMDataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCacheDataCache
Inst.CacheInst.Cache
DMACDMAC
IVC2IVC2
DataRAMDataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCacheDataCache
Inst.CacheInst.Cache
DMACDMAC
IVC2IVC2
DataRAMDataRAM
DataCache
Inst.Cache
DMAC
IVC2
DataRAM
DataCacheDataCache
Inst.CacheInst.Cache
DMACDMAC
IVC2IVC2
DataRAMDataRAM
MCU
I/F
CANCAN
CANCAN
CANCAN
HOG Histogram Filter
Crossbar Switch
Matching Affine
System
RAM
x2
• Heterogeneous multi-core architecture
• Multi-level parallelism – Data=SIMD / Instruction=VLIW / Module=Image Processing Accelerator (IPA) /
Thread=Multiple cores
Architecture of TMPV7506XBG
Thread-level parallelism with 4 MPEs,
Instruction level with VLIW, and data level with SIMD
Fast image processing using 5 types of IPAs
Wide-band bus with cross bar switch for parallel processing
Flexible memory access
optimization by internal
memories and DMAs
© 2014 Toshiba Corporation 12
Media Processing Engine (MPE)
IVC2
Core Registers
Inst. Decoder
Data Cache Data RAM
Coprocessor Instr. Decoder
MeP core
Instruction Buffer
16/32
32bit ALU
Instruction Cache
Coprocessor Registers
Pip
e0
Pip
e1
ALU
Instru
ctio
n-
1
Instru
ctio
n-
3
Instru
ctio
n-
2
ALU
MPE
Coprocessor for media processing • 2 instruction pipelines. • Each pipeline can execute a
SIMD (Single Instruction Multiple Data) instruction
• 64-bit register can handle eight 8-bit/four 16-bit/two 32-bit data simultaneously
Media Processing Engine 3 instructions /cycle by VLIW (Very Long Instruction Word) technology
Media embedded Processor • Toshiba original 32-bit RISC
CPU core • low-power consumption
© 2014 Toshiba Corporation 13
IPA: HOG module
Function Fast computation of HOG/CoHOG[Watanabe10] image feature followed by linear SVM classification
Interface In: gradient orientation image Out: classification result / feature vector
Use case Object (e.g. pedestrian) detection
HOG module
HOG/ CoHOG
f
Linear SVM wTf+b
f
gradient orientation
Parameters w, b
© 2014 Toshiba Corporation 14
Image feature: HOG and CoHOG
HOG
CoHOG
Combination of gradient orientation
frequency
… … …
… … …
frequency
… …
… …
Combination of gradient orientation
Origin fr
equency
Gradient orientation
© 2014 Toshiba Corporation 15
LBP(subset)
0 6 3 7 4 2 0 1 4 3 6 2 1 7 4 5 2 0 3 4 6 2 0 4 6 5 7 3 4 1 1 2 6 3 4 0 5 6 4 5 2 2 0 1 4 3 6 2 0 6 3 7 4 2 3 4 6 2 0 4 6 5 7 3 4 1 0 1 4 3 6 2
Encoded image
Flexibility of HOG module
HOG module has a flexibility to compute different types of
co-occurrence histogram according to input data.
HOG module
Block division
Co-occurrence histogram for different pair of pixel positions
Feature vector
Gradient orientation
1 5 6 0 2 6 0 0 4 3 6 1 3 7 0 5 2 0 6 4 1 2 0 6 8 3 2 3 2 4 0 7 6 0 4 3 2 6 4 5 2 2 1 1 3 3 6 0 0 6 3 7 0 1 2 4 6 2 0 5 6 5 3 0 4 2 2 1 5 2 3 3
CoHOG encodes shape
CoHLBP [Watanabe13]
encodes texture
Pixel combination
© 2014 Toshiba Corporation 16
IPA: Histogram module
Histogram of intensities
Intensity conversion using a look-up-table
Function Fast histogram generation by parallel voting Data value conversion by a LUT
Interface In: Data array (e.g. image, 1D data array) Out: Histogram / Converted data array
Use case Contrast enhancement by histogram equalization Vote counting for Hough transform
© 2014 Toshiba Corporation 17
IPA: Filter module
Load/Store unit
PE
1
PE
2
PE
3
PE
64 ・・・
64 processing elements@200MHz
Function Load local image around reference pixel, execute user-defined operations, and replace the reference pixel value with the result
Interface In: Image data array Out: Converted image
Use case Various local operations: e.g. Gaussian filter, Sobel filter median filter, Harris feature point extraction, etc.
© 2014 Toshiba Corporation 18
IPA: Affine module
Arbitrary image deformation
Lens distortion correction
Affine Transformation
Arbitrary deformation
Conversion table
Affine trans. parameters
Lens distortion parameters
© 2014 Toshiba Corporation 19
IPA: Matching module
• Template matching by SAD
– To find a position that has minimum SAD value
2D search in a local rectangle
1D search along with an epipolar line
Left image Right image Disparity
Motion estimation
Stereo disparity estimation
time t time t+1
© 2014 Toshiba Corporation 20
• Back-over Prevention using stereo cameras
– Collision warning for backing up by obstacle/pedestrian detection
• Using commercial wide-angle camera for back-view monitor
– Large lens distortion
Example of optimization: Back-over Prevention
Left image Right image
[okada13]
© 2014 Toshiba Corporation 21
Processing flow of Back-over Prevention
Image input
Stereo image input Rectified image Disparity map Detection result
Blue = far
Undistortion Depth
estimation Obstacle detection
Pedestrian detection
Warning
Image Feature CoHOG
Classifier Linear SVM
© 2014 Toshiba Corporation 22
Example of obstacle/pedestrian detection
Crouching person is detected as an obstacle
© 2014 Toshiba Corporation 23
• Each procedure is assigned to suitable IPA
Implementation on TMPV7506XBG
Undistortion& Rectification
Luminance correction
Depth estimation
Depth→Color
Shrink
Gradient orientation
Obstacle detection
Pedestrian detection
Affine
Filter
Matching
Histogram
Affine
Filter
MPE
HOG
Image correction
Depth estimation
Obstacle detection
Pattern recognition
IPA/MPE Procedure
© 2014 Toshiba Corporation 24
⓪ Before optimization 1120ms
x25 ① Use IPAs (sequential procedure) 45ms
Optimization process (1)
Time
HWs
(Display)
Vid
eo r
ate
TMPV7506XBG
(Display)
© 2014 Toshiba Corporation 25
② Run independent MPEs/IPAs in parallel 42ms ③ Optimize memory access 33ms
– Cache、DataRAM, WorkRAM, DMAC
④ Introduce pipeline procedure 29ms – Perform “undistortion”@Affine for upper half image – When finished, start “luminance correction (zero mean)”@Filter for upper half
image while performing “undistortion”@Affine for lower half image
Optimization process (2) x1.1
LSI Power consumption is about 0.75 W
x1.3
x1.1
Vid
eo r
ate
TMPV7506XBG
© 2014 Toshiba Corporation 26
• Improved pedestrian/vehicle detection
– Pattern recognition introducing color-based image feature
– Multi-class classification
• Obstacle detection using a single camera
– 3D reconstruction (SfM)
• Realized by image processing accelerators
Future direction of TMPV family
Vehicles
Pedestrians
New
Enhance Pattern recognition
3D reconstruction (SfM)
Next gen.
© 2014 Toshiba Corporation 27
3D shape (depth) estimation using a camera
3D reconstruction (Structure from Motion)
Multiple images taken from different view angles
3D shape (Depth)
3D reconstruction
Camera motion
Single camera
© 2014 Toshiba Corporation 28
Obstacle detection 3D position estimation (Point cloud)
Camera motion estimation
Obstacle detection based on SfM
Feature point
Feature vector
Motion estimation
Multi-view stereo matching Obstacle detection
(Every few frames)
Camera motion
R, t 3D point cloud
Obstacle position
Refine
Feature matching
3D position estimation using image frames captured at different moment
© 2014 Toshiba Corporation 29
Accurate depth information
• Finding point correspondences using multiple images
⇒ Accurate disparity estimation
• Point correspondences are represented by a parametric probability distribution [Vogiatzis11]
⇒ Saving memory consumption
© 2014 Toshiba Corporation 30
Example of obstacle detection
Distance 32m Height 30cm
© 2014 Toshiba Corporation 31
Pattern recognition
• Improved recognition accuracy using a new image feature,
Heterogeneous co-occurrence feature [Ito10]
– Extension to CoHOG feature
– Combination of 4 types of color-based image features to describe shape and texture
Example:
color information can tell us the boundary of the pedestrian
© 2014 Toshiba Corporation 32
• Types of vision processors
• Vision processors for automobiles
– Toshiba’s image recognition LSI, TMPV75 series
• Cloud computing and vision processors
– Surveillance camera
• Future direction and summary
Contents
© 2014 Toshiba Corporation 33
• Another frontier of embedded computer vision
• Current camera system
– records video streams from cameras, and human observers look them over after something has happed
– detects changes and motions
Surveillance camera system
Network cameras
Hub
Recorder
• 既存の監視カメラシステムに接続するだけで、画像解析処理によるインテリジェント機能を付加、必要な情報のみをクラウドに送信することで通信量を大幅削減
• 車載向けに開発された画像認識プロセッサViscontiTM2を搭載を搭載することで、低消費電力、高信頼性を実現
2011-2012 Surveillance camera market and business - CMOS, CCD camera series VOL.1 -
Surveillance camera sales (World) #
cam
era
(k)
© 2014 Toshiba Corporation 34
• What is a suitable system configuration for video analysis using thousands of cameras?
• Cloud?
Making camera system intelligent
Network camera
Hub
Current camera system
Recorder
Image transfer
Data center
Processing load
Comm. load
[Pham14]
© 2014 Toshiba Corporation 35
Embedded vision processing can solve the problems
Intelligent surveillance camera system
Network camera
Hub
Recorder
Meta data Data center
Image
Video analysis set-top-box
Vision processor
顔DB
Face Recognition
Human Identification
Vision processor
Data size Processing load
© 2014 Toshiba Corporation 36
• TMPV7506XBG analyzes captured images in the camera
• Example of application: Multiple object detection
– Four different types of objects are detected simultaneously
Intelligent camera using TMPV7506XBG
Total power consumption is
5-6 W
TMPV7506XBG
© 2014 Toshiba Corporation 37
Video analysis set-top-box using TMPV7506XBG
The set-top-box can analyze up to 4 camera images
Video Analysis STB
Camera images Application on cloud
People Count
Trajectory
© 2014 Toshiba Corporation 38
• Types of vision processors
• Vision processors for automobiles
– Toshiba’s image recognition LSI, TMPV75 series
• Cloud computing and vision processors
– Surveillance camera
• Future direction and summary
Contents
© 2014 Toshiba Corporation 39
• Accelerators are often used for realizing specific applications
• Some of technologies are introduced to general purpose processors to achieve higher efficiency
General trend of processor LSIs
Time
Eff
icie
ncy
(e.g
. G
OP
S/W
)
General purpose processors
•3D graphics
•Image compression
•Super computer
Automotive
Wearable?
GPU Codec SIMD
© 2014 Toshiba Corporation 40
• Heterogeneous multicore architecture stays dominant
– CPU cores + GPGPU (+ Accelerators)
• Many functions will be realized by software after 2020
Future direction of vision processors
Intel ATOM
Tegra K1
EyeQ2
TMPV7506XBG
DaVinci
SH7766
Share
Time
Eff
icie
ncy
(e.g
. G
OP
S/W
)
2010 2015 2005
Minimum performance required for practical apps.
2020
Wider application
range of CV will
open up
Limited users
© 2014 Toshiba Corporation 41
• Type of vision processors
– Vision chip: Logic-circuit-embedded image sensor
– Discrete LSI : Heterogeneous multi-core architecture
• Vision processors for automobiles
– Toshiba’s TMPV family:
• 5 types of image processing accelerators
– Future direction
• Color-based image feature, multi-class classifier, SfM
• Vision processors will make surveillance cameras intelligent efficiently
– Efficiency is achieved by good balance between on-site processing and cloud processing
• Future direction
– Progress of LSI technology will widen CV application range
Summary
© 2014 Toshiba Corporation 42
[Mead89] Carver Mead, Analog VLSI and Neural Systems" Addison-Wesley Pub
[Nitta92] Y. Nitta, et al., Proposal of an Optical Neurochip with Internal Analogue Memory and Its Fundamental Characteristics,
Japanese journal of applied physics. Pt. 2, Letters 31(8B), L1182-L1184, 1992
[Bernard93] T. M. Bernard, Y. Zavidovique and F. J. Devos: A Programmable Artificial Retina,
IEEE J. Solid-State Circuits, vol.28, no.7, pp.789-798, 1993.
[Astrom96] A. Astrom, J.-E. Eklund and R. Forchheimer: Global Feature Extraction Operations for Near-Sensor Image Processing, IEEE Trans. Image Processing, vol.5, no.1, pp.102-110, 1996.
[Ishii96] I. Ishii, et al., Target Tracking Algorithm for 1ms Visual Feedback System Using Massively Parallel Processing,
Proc. IEEE Int. Conf. Robotics and Automation, pp.2309-2314, 1996
[Nakabo02] Y. Nakabo, et al., 3D Tracking Using Two High-Speed Vision Systems,
Proc. of IEEE/RSJ Int. Conf. Intelligent Robots and Systems, pp360-365, 2002
[Johansson03] R. Johansson, L. Lindgren, J. Melander and B. Moller: A Multi-Resolution 1000 GOPS 4 Gpixels/s Programmable CMOS Image
Sensor for Machine Vision, Proc. IEEE Workshop on Charge-Coupled Devices and Advanced Image Sensors, 2003.
[Tanabe12] Y. Tanabe, et al. A 464GOPS 620GOPS/W Heterogeneous Multi-Core SoC for Image-Recognition Applications,
ISSCC Dig Tech Papers, pp. 15-16, 2012
[Watanabe10] T. Watanabe and et al., Co-occurrence Histogram of Oriented Gradients for Human Detection,
IPSJ Trans. on Computer Vision and Applications, Vol. 2, pp. 39-47, 2010
[Watanabe13] T. Watanabe and S. Ito, Two co-occurrence histogram features using gradient orientations and
local binary patterns for pedestrian detection, Proc. of ACPR, pp. 415-419, 2013
[Okada13] R. Okada, T. Watanabe, M. Nishiyama, A. Seki, T. Kozakaya, M. Banno, Multiple Object Detection using Image
Recognition LSI for Automobiles, Proc. of 20th ITS World Congress, No. 4185, 2013
[Vogiatzis11] G.Vogiatzis, et al., Video-based, real-time multi-view stereo, Image and Vision Computing, Vol.29, No.7, pp.434-441, 2011.
[Pham14] Pham, et al., DIET: Dynamic Integration of Extended Tracklets for Tracking Multiple Persons, Proc. of ICPR, 2014 (To be appeared)
[Ito10] S. Ito and S. Kubota, Object Classification Using Heterogeneous Co-occurrence features, Proc. of ECCV, 2010
[TMPV] http://www.semicon.toshiba.co.jp/eng/product/assp/automotive/infotain/tmpv7500/index.html
[EyeQ] http://www.mobileye.com/technology/processing-platforms/eyeq2/
[SH] http://hk.renesas.com/applications/automotive/adas/surround/sh7766/index.jsp
[Tegra] http://www.nvidia.com/object/tegra-k1-processor.html
[DaVinci] http://www.tij.co.jp/jp/lit/ds/symlink/tms320dm8148.pdf
References
© 2014 Toshiba Corporation 43
Product names (mentioned herein) may be trademarks of their respective companies.