2009dasip_01_2
TRANSCRIPT
-
8/2/2019 2009dasip_01_2
1/5
Hardware Implementation of a Real Time Lucas and Kanade Optical Flow
N. Roudel, F. Berry, J. SerotLASMEA
24 avenue des Landais
63177 Aubiere, France
roudel,berry,[email protected]
L. EckCEA List
8 route du Panorama, BP6
92265 Fontenay-aux-roses, France
Abstract
This paper presents a FPGA-based design that aims at
apply real-time vision processes and specially Optical flow
estimation processes. The main goal of this work is to beembedded in a Micro air robot in order to provide critical
information for autonomous flights. Thus, the motion field
is one of the dominating information in the way of safety
for the robot. Based on these motion information, obsta-
cles avoidance, for example, could be add to increase the
autonomous degree of the robot.
1 Introduction
Since many years, lot of projects on development of au-
tonomous land or air robots(UAV for Unmanned Aerial Ve-hicle) have been launched. Many reasons may explain such
craze for these topics. Indeed specific tasks could be unsafe
or even impossible for a human customers (areas of fight-
ing, nuclear radiation, hazardous areas ,...). However, in
spite of the hostile environments, the robot integrity must
be insured. For this reason, different strategies of navi-
gation and exploration can be used. In the most of these
strategies, the knowledge of ego-motion and the measure-
ments of potential moving target is a keystone of numerous
algorithms. The motion evaluation can be done by differ-
ent kind of sensors (inertial set, camera,...). Using a camera
implies the computation of optical flow which is defined by
pattern of apparent motion of objects, surfaces, and edges
in a visual scene caused by the relative motion between
an observer and the scene. However, extraction of optical
flow has a high computation cost and usually the strategy
to evaluate optical flow with an air robot consists in send-
ing (via wireless communication) image flow and comput-
ing the motion on remote hardware. Once the computation
is done, safety strategy is elaborated from these informa-
tion and appropriate action is sent to the robot. This implies
that on long distance flights (considering a static process-
ing base on the ground), lost of communication can appear
and the UAV safety is not sure. Consequently, the auton-
omy of robot flight is highly limited. On the other hand,
UAV implies several constraints such as the weight or/and
the power consumption. So, the robot designer have to se-lect the best matching between sensing devices, algorithms
and hardware for processing.
In these works, we propose to use a camera associated
with a FPGA-embedded optical flow algorithm in order to
measure the motion field around the robot.
Some papers, dealing with a hardware implementation
of optical flow computation, can be found. Thus [1] and
[2]proposed optical flow algorithm implementation with a
computation speed of about 20-30FPS for a small image
resolution (less than VGA resolution). In our application,
this speed is too low for safeguarding the robot. In [3], a
high speed real-time optical flow implementation is dis-played with a PCI-e card including two FPGAs. This kind
of works does not take into account the embedded aspect.
Others papers, such as [4] and [5], proposed Real-time
optical flow estimations based on different algorithms.
The structure of this paper is as follows. In section
2, the Lucas and Kanade optical flow algorithm is pre-
sented and few considerations on its implementation are
proposed. The next sections (3 and 4) introduces the data
flow design and proposes the hardware implementation of
the process. Finally, experimental results on realistic im-
age sequence obtained by an implementation on our smart
camera (SeeMos)is given in section 5.
2 Lucas and Kanade Algorithm
Due to the works of Baron [6] which compare the per-
formances of correlation, gradient, energy and phase-based
optical flow extraction methods, the Lucas and Kanade
method [7] has been chosen as its not recursion and low
computational complexity providing good accuracy. This
method is local method which is only valid on small
-
8/2/2019 2009dasip_01_2
2/5
motions. The main errors source of this algorithm is the
well known aperture problem [8].
In this gradient-based method, velocity is computed
from first-order derivatives of image brightness, using the
motion constraint equation:
I
x
dx
dt+
I
y
dy
dt+I
t= 0 (1)
Where I denotes the image intensity and where we can de-
fine the motion U =
u
v
with u = dx
dtand v = dy
dt.
In using the followinf notation Ix =Ix
, Iy =Iy
and
It =It
, Eq. 1 can be written as the following:
IT.U = It (2)The Lucas and Kanade approach can be considered as a
minimization problem. Indeed, this method uses the least
squares method applied to a region of interest () in theimage. The velocity is computed on each pixel by the reso-
lution of Eq. 4 :
ATW2AU = ATW2b (3)
U = [ATW2A]1ATW2b (4)
Where, for n points xi at a single instant t:A = [I(x1), ...,I(xn)]T
W = diag[W(x1),...,W(xn)]
b = [It(x1),...,It(xn)]T
In order to avoid the non-reversibility of the
matrix, a coefficient is added, according to
*******REFFFFFFFFFFFFFFFF**********, given
the Eq. 5
ATW2A =
x
W(x)2Ix(x)2 +
x
W(x)2Ix(x)Iy(x)x
W(x)2Ix(x)Iy(x)
x
W(x)2Iy(x)2 +
(5)
ATW2b =
x
W(x)2Ix(x)It(x)
xW(x)2Iy(x)It(x)
(6)
where W represents a diagonal matrix which weights the
constraints with higher weights around the center of .
3 Design
3.1 Data flow
The data flow of the proposed algorithm is divided into
four major parts as shown in Fig. 1. The first one is the
ImageSequence
Memory Swapping
Memory Swapping
ItGradient Ix Gradient Iy Gradient
Least SquareMatrices Building
Matrix Inversion
Optical FlowComputation
u v
It Ix Iy
A W AT 2
A W bT 2
[A W A]T 2 -1
Figure 1. Design of the flow.
shaping of image data used to provide necessary data for the
next step of computation. The second is the 3D-gradients
computation ( x , y , t ) applied to generate primitive infor-
mation for the optical flow estimation. The last one consists
in computing the both matrix ATW2A and ATW2b.
3.1.1 Data Shaping Module
To perform the Lucas and Kanade optical flow estimation,
two images are needed in order to compute It and on the
other hand, Ix and Iy need three rows of the same image.
For these reasons, the data shaping module controls the im-
age flow in using three memories noted R1 , R2 , R3. To
maintain a high level of synchronization, input image is
stored in a memory and a memories swapping process is
applied with the following FSM Tab. 1. Indeed, two frames
are necessary to compute t-gradient. To perform the shap-
ing of n rows of the same image, a FIFO memory with a
size of image width.
3.1.2 3D-gradients computation
Previous module supplying only useful information, the
3D-gradients computation becomes easier. Indeed, to per-
form the t-gradient, a subtraction between the data from
two memories is needed. In order to minimize the noise
a threshold is applied on the substraction. For x and y gra-
dients, previous module provides a matrix ofnn on whicha convolution is applied with a predefined mask.
-
8/2/2019 2009dasip_01_2
3/5
Table 1. Memories Swapping Flow.
Current State Action Next state
S0 First Writing in R1 S1
S1 End of First Writing in R1 S2
S2 First Writing in R2 S3
S3 End of First Writing in R2 S4S4 Writing in R3 and Reading R2-R1 S5
S5 End of Writing in R3 an d Reading R 2-R1 S6
S6 Writing in R1 and Reading R3-R2 S7
S7 End of Writing in R1 an d Reading R 3-R2 S8
S8 Writing in R2 and Reading R1-R3 S9
S9 End of Writing in R2 an d Reading R 1-R3 S4
3.1.3 Optical flow computation
This module is the core of the design. ATW2A and
ATW2b matrices are, in a first step, singly computed.
Once done, velocity is obtained by multiplication between
the 2x2 [ATW2A]1 and the 2x1 [ATW2b] matrix. As forprevious module, a shaping is used to provide a nn matrixof x gradients and an other one of y gradients. Then the ma-
trix computation of [ATW2A] occurs with multiplicationsand additions. The next stage is the inversion of this 2 2matrix to obtain [ATW2A]1. Hence, a division by deter-miner has to be done. Once u and v computation are done,
these information are available for post processes.
4 Hardware Implementation
4.1 Least Square Matrices Building
To compute the motion field, two matrices representing
Eq. 5 and Eq. 6 are needed. To build these matrices, five
products are required : I2x, IxIy, I2y , IxIt and Iy It. Foreach product, a nn matrix is generated and each element isweighted with a higher weight for the central value, then all
elements of each matrices are summed. For a computation
with n x n matrices, hardware cost is given by:
Memory : 5ImageWitdhDataWidth(n1)bits.
Multplications : 5 + (5 n2).
Additions : 5 (n 1)2.For a good tradeoff between accuracy and hardware cost,
the choice of a 3 3 matrix seems to be the most judicious.Indeed, for example with a 3 3 matrix for a 800 600image resolution and a data width of 20 bits, 120Kbits of
memory, 50 multipliers and 20 adders are needed against
240Kbits, 130 and 80 for a 5 5 matrix. Given that thegoal of optical flow computation is to provide critical infor-
mation to apply other processes, the use of a 5 5 matrixcould compromised the integration of these processes.
Ix
Iy
ItX
Ix
Ix.Iy
Iy
Ix.It
Iy.It
DataShaping
DataShaping
DataShaping
DataShaping
DataShaping
Weighting
Weighting
Weighting
Weighting
Weighting
Sumofallcomponent
Sumofallcomponent
Sumofallcomponent
Sumofallcomponent
Sumofallcomponent
Figure 2. Least Square Matrices Construc-tion.
4.2 Matrix Inversion and integer to fixed-point conversion unit
To compute the optical flow, 2 2 matrix representingEq. 5must be inverted. In order to have the same data width
on the numerator and on the denominator, the determinant iscomputed and truncated. To perform the division, a function
generated by the MegaWizard plug-in Manager included in
the software Quartus II provided by Altera is used [13]. Af-
ter Timming simulation, if the pipelined mode is not used,
the maximum speed of this block is not enough high what
is an important bottle neck for the design. Hence, the use of
the pipelined mode with a latency of one cycle clock is nec-
essary to keep the flow speed. To keep the accuracy of the
computation, a division process, shown in Fig. 3, is applied.
Indeed, the quotient of the first division has only few signifi-
cant bits due to the local estimation of the flow. Then others
bits are needed to increase the accuracy, thus the remain of
the first division is multiplied by 2p, where p represents thenumber of decimal values, and divided by the determiner.
Numerator
DenominatorDivider
Quotient
Remain
x2p
DividerQuotient
Remain
Concatenation Result
Figure 3. Division Flow.
5 Results
5.1 Smart Camera SeeMOS
In order to validate this optical flow estimation, a smart
camera research platform named SeeMOS is used. The
SeeMOS architecture is presented in Fig. 4: This architec-
ture is designed around a FPGA, and more precisely an
Altera Stratix EP1S60. The FPGA device plays the central
role in the system, being responsible to interconnect all
other hardware devices. Surrounding it, 10Mb (5x2) of
-
8/2/2019 2009dasip_01_2
4/5
Figure 4. Architecture of the SeeMOS smartcamera.
SRAM and 64Mb of SDRAM are available for image and
other data storage.
The sensing devices board is composed, among others,of a CMOS imager (LUPA 4000) manufactured by Cy-
press. This 4 mega-pixel CMOS active pixel sensor features
synchronous shut- ter and a maximum frame-rate of 15fps
at full resolution ( 2048x2048 ). The readout speed can be
boosted by means of sub sampling and windowed Region
Of Interest (ROI) readout. Dynamic high range scenes can
be captured using the double and multiple slope functional-
ity. The dynamic optical range can be up to 67 dB in single
slope operation and up to 90 dB in multiple slope operation.
Finally, the communication between the smart camera
and a PC is realized by the communication board using aFirewire 1394 link. The main clock for IEEE.1394 link
to send one Byte is 20Mhz. In the case of the presented
paper, 2 cycle of clock are needed to obtain the velocity
on one pixel. Thus, for a frame with a resolution of
500x500, the frame rate obtained is of 40 Frames per
second. In full resolution ( 2048x2048 ), the frame rate is
around 2.4 FPS. In order to display the motion field, a C++
library is used. This library, Cimg [12], receives u and v
values for each pixel, builds the motion field and displays it.
[9], [10],[11] give more details about SeeMOS platform.
5.2 Experiments results
To discuss about accuracy of the system, the classical set
of image sequence of the domain has been used. Indeed,
without this set, accuracy estimation could be impossible
due to the fact that the real flow of a real scene image se-
quence is unknown. Thus, Translating Tree and Diverging
Tree sequence have been loaded into external memories. To
evaluate the performance of the presented system, two er-
ror measurement are applied. The first one is the Average
Angular Error (AAE) Eq. 7. Angular Error is the angle be-
tween the correct flow vc = (uc,vc,1)T
u2c+v2c+1
and the estimated
flow ve = (ue,ve,1)T
u2e+v2e+1
. The second one is the Root Mean
Square Error (RMS) Eq. 8
AAE=1
H.W
arccos(vc .ve) (7)
RMS=
1
H.W
((uc ue)2 + (vc ve)2) (8)
Where H and W represent the height and the width of the
image resolution.
Sequence AAE RMS Density Parameters
Translating Tree 1.43 0.2 100% = 1Diverging Tree 7 2.1 100% = 1
Table 2. Translating and Diverging Tree errormeasure.
Tab. 3 shows synthesis report for an image resolution of
800 600. As said before, the choice of a 3 3least squarematrix building is the most judicious, indeed, the stage uses
37% of the FPGA LEs. Optical flow computation reach to
42% of the device which is no negligible in order to imple-ment post processes using these motion information.
6 Conclusion and Future Research
In this paper, a FPGA-based system to compute the
well-known Lucas and Kanade optical flow algorithm is
presented. The main goal of this system is to provide
motion field in real-time. One of the most important aspect
is the speed of the computation. Indeed, the presented
processing aims at being embedded on an air robot flying
over 15 Km/h and so the motion field has to be refreshed at
high rate.
The optical flow computation is the first step of a project
for a flying methodology including time-to-contact estima-
tion, obstacle avoidance or also autonomous navigation.
The main goal of the future work is to proposed a recon-
figurable architecture dedicated to autonomous flying of air
robot.
References
[1] M.V. Correia and A. Campilho, : A pipelined Real-
Time Optical Flow Algorithm, ICIAR - 2004.
-
8/2/2019 2009dasip_01_2
5/5
Block Number of LEs Memory bits Maximum Clock Frequency
Data Shaping 39 ( < 1% ) 12768 bits ( < 1% ) 176 MHzGradients Computation 27 ( < 1% ) 0 66 MHzLeast Square Matrices building 21094 ( 37% ) 143280 ( 3% ) 176 MHzMatrix Inversion 2405 ( 4% ) 0 15 MHz
Optical flow computation 80 ( < 1% ) 0 35 MHz
Table 3. Synthesis report.
[2] J. Diaz, E. Ros, F. Pelayo, E.M. Ortigosa and S. Mota,
: FPGA-based real-time optical-flow system, IEEE
Trans. Circuits and Systems of Video Technology -
2006.
[3] Hiroshima University Robotics Laboratory website :
http://www.robotics.hiroshima-
u.ac.jp/hyper human vision/opt flow-e.html
[4] P.C. Arribas, : Real Time Hardware Vision SystemApplications : Opitcal Flow and Time to Contact De-
tector Units, IEEE International Caracas Conference
on Devices, Circuits ans System - 2004.
[5] M.M. Abulated, A. Hamdy, M.E. Abuelwafa and E.M.
Saad, : A Reliable FPGA-Based Real-Time Optical-
flow Estimation, International Journal of Computer
Systems Science and Engineering - 2008.
[6] J.L. Baron, D.J. Fleet and S.S. Beauchemin, : Perfor-
mance of optical flow techniques, International Jour-
nal of Computer Vision - 1994.
[7] B.D. Lucas and T. Kanade, : An iterative image regis-
tration technique with an application to stereo vision,
Proceedings of the 7th International Joint Conference
on Artificial Intelligence - 1981.
[8] K. Nakayama and G. Silverman, : The aperture
problem-I, Vision Research 28 - 1988.
[9] P. Chalimbaud and F. Berry, : Embedded active vision
system based on an FPGA architecture , EURASIP
Journal on Embedded Systems, - 2007.
[10] F. Dias, F. Berry, J. Serot and F. Marmoiton, : Hard-
ware, Design and Implementation Issues on a Fpga-Based Smart Camera , Distributed Smart Cameras,
2007. ICDSC 07. First ACM/IEEE International Con-
ference - 2007.
[11] SeeMOS Project Website:
http://wwwlasmea.univ-
bpclermont.fr/Personnel/Francois.Berry/seemos.htm
[12] Cimg Website:
http://cimg.sourceforge.net/
[13] Quartus II divider datasheet:
http://www.altera.com/literature/ug/ug lpm divide mf.pdf