2009dasip_01_2

Upload: waff-elt

Post on 06-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 2009dasip_01_2

    1/5

    Hardware Implementation of a Real Time Lucas and Kanade Optical Flow

    N. Roudel, F. Berry, J. SerotLASMEA

    24 avenue des Landais

    63177 Aubiere, France

    roudel,berry,[email protected]

    L. EckCEA List

    8 route du Panorama, BP6

    92265 Fontenay-aux-roses, France

    [email protected]

    Abstract

    This paper presents a FPGA-based design that aims at

    apply real-time vision processes and specially Optical flow

    estimation processes. The main goal of this work is to beembedded in a Micro air robot in order to provide critical

    information for autonomous flights. Thus, the motion field

    is one of the dominating information in the way of safety

    for the robot. Based on these motion information, obsta-

    cles avoidance, for example, could be add to increase the

    autonomous degree of the robot.

    1 Introduction

    Since many years, lot of projects on development of au-

    tonomous land or air robots(UAV for Unmanned Aerial Ve-hicle) have been launched. Many reasons may explain such

    craze for these topics. Indeed specific tasks could be unsafe

    or even impossible for a human customers (areas of fight-

    ing, nuclear radiation, hazardous areas ,...). However, in

    spite of the hostile environments, the robot integrity must

    be insured. For this reason, different strategies of navi-

    gation and exploration can be used. In the most of these

    strategies, the knowledge of ego-motion and the measure-

    ments of potential moving target is a keystone of numerous

    algorithms. The motion evaluation can be done by differ-

    ent kind of sensors (inertial set, camera,...). Using a camera

    implies the computation of optical flow which is defined by

    pattern of apparent motion of objects, surfaces, and edges

    in a visual scene caused by the relative motion between

    an observer and the scene. However, extraction of optical

    flow has a high computation cost and usually the strategy

    to evaluate optical flow with an air robot consists in send-

    ing (via wireless communication) image flow and comput-

    ing the motion on remote hardware. Once the computation

    is done, safety strategy is elaborated from these informa-

    tion and appropriate action is sent to the robot. This implies

    that on long distance flights (considering a static process-

    ing base on the ground), lost of communication can appear

    and the UAV safety is not sure. Consequently, the auton-

    omy of robot flight is highly limited. On the other hand,

    UAV implies several constraints such as the weight or/and

    the power consumption. So, the robot designer have to se-lect the best matching between sensing devices, algorithms

    and hardware for processing.

    In these works, we propose to use a camera associated

    with a FPGA-embedded optical flow algorithm in order to

    measure the motion field around the robot.

    Some papers, dealing with a hardware implementation

    of optical flow computation, can be found. Thus [1] and

    [2]proposed optical flow algorithm implementation with a

    computation speed of about 20-30FPS for a small image

    resolution (less than VGA resolution). In our application,

    this speed is too low for safeguarding the robot. In [3], a

    high speed real-time optical flow implementation is dis-played with a PCI-e card including two FPGAs. This kind

    of works does not take into account the embedded aspect.

    Others papers, such as [4] and [5], proposed Real-time

    optical flow estimations based on different algorithms.

    The structure of this paper is as follows. In section

    2, the Lucas and Kanade optical flow algorithm is pre-

    sented and few considerations on its implementation are

    proposed. The next sections (3 and 4) introduces the data

    flow design and proposes the hardware implementation of

    the process. Finally, experimental results on realistic im-

    age sequence obtained by an implementation on our smart

    camera (SeeMos)is given in section 5.

    2 Lucas and Kanade Algorithm

    Due to the works of Baron [6] which compare the per-

    formances of correlation, gradient, energy and phase-based

    optical flow extraction methods, the Lucas and Kanade

    method [7] has been chosen as its not recursion and low

    computational complexity providing good accuracy. This

    method is local method which is only valid on small

  • 8/2/2019 2009dasip_01_2

    2/5

    motions. The main errors source of this algorithm is the

    well known aperture problem [8].

    In this gradient-based method, velocity is computed

    from first-order derivatives of image brightness, using the

    motion constraint equation:

    I

    x

    dx

    dt+

    I

    y

    dy

    dt+I

    t= 0 (1)

    Where I denotes the image intensity and where we can de-

    fine the motion U =

    u

    v

    with u = dx

    dtand v = dy

    dt.

    In using the followinf notation Ix =Ix

    , Iy =Iy

    and

    It =It

    , Eq. 1 can be written as the following:

    IT.U = It (2)The Lucas and Kanade approach can be considered as a

    minimization problem. Indeed, this method uses the least

    squares method applied to a region of interest () in theimage. The velocity is computed on each pixel by the reso-

    lution of Eq. 4 :

    ATW2AU = ATW2b (3)

    U = [ATW2A]1ATW2b (4)

    Where, for n points xi at a single instant t:A = [I(x1), ...,I(xn)]T

    W = diag[W(x1),...,W(xn)]

    b = [It(x1),...,It(xn)]T

    In order to avoid the non-reversibility of the

    matrix, a coefficient is added, according to

    *******REFFFFFFFFFFFFFFFF**********, given

    the Eq. 5

    ATW2A =

    x

    W(x)2Ix(x)2 +

    x

    W(x)2Ix(x)Iy(x)x

    W(x)2Ix(x)Iy(x)

    x

    W(x)2Iy(x)2 +

    (5)

    ATW2b =

    x

    W(x)2Ix(x)It(x)

    xW(x)2Iy(x)It(x)

    (6)

    where W represents a diagonal matrix which weights the

    constraints with higher weights around the center of .

    3 Design

    3.1 Data flow

    The data flow of the proposed algorithm is divided into

    four major parts as shown in Fig. 1. The first one is the

    ImageSequence

    Memory Swapping

    Memory Swapping

    ItGradient Ix Gradient Iy Gradient

    Least SquareMatrices Building

    Matrix Inversion

    Optical FlowComputation

    u v

    It Ix Iy

    A W AT 2

    A W bT 2

    [A W A]T 2 -1

    Figure 1. Design of the flow.

    shaping of image data used to provide necessary data for the

    next step of computation. The second is the 3D-gradients

    computation ( x , y , t ) applied to generate primitive infor-

    mation for the optical flow estimation. The last one consists

    in computing the both matrix ATW2A and ATW2b.

    3.1.1 Data Shaping Module

    To perform the Lucas and Kanade optical flow estimation,

    two images are needed in order to compute It and on the

    other hand, Ix and Iy need three rows of the same image.

    For these reasons, the data shaping module controls the im-

    age flow in using three memories noted R1 , R2 , R3. To

    maintain a high level of synchronization, input image is

    stored in a memory and a memories swapping process is

    applied with the following FSM Tab. 1. Indeed, two frames

    are necessary to compute t-gradient. To perform the shap-

    ing of n rows of the same image, a FIFO memory with a

    size of image width.

    3.1.2 3D-gradients computation

    Previous module supplying only useful information, the

    3D-gradients computation becomes easier. Indeed, to per-

    form the t-gradient, a subtraction between the data from

    two memories is needed. In order to minimize the noise

    a threshold is applied on the substraction. For x and y gra-

    dients, previous module provides a matrix ofnn on whicha convolution is applied with a predefined mask.

  • 8/2/2019 2009dasip_01_2

    3/5

    Table 1. Memories Swapping Flow.

    Current State Action Next state

    S0 First Writing in R1 S1

    S1 End of First Writing in R1 S2

    S2 First Writing in R2 S3

    S3 End of First Writing in R2 S4S4 Writing in R3 and Reading R2-R1 S5

    S5 End of Writing in R3 an d Reading R 2-R1 S6

    S6 Writing in R1 and Reading R3-R2 S7

    S7 End of Writing in R1 an d Reading R 3-R2 S8

    S8 Writing in R2 and Reading R1-R3 S9

    S9 End of Writing in R2 an d Reading R 1-R3 S4

    3.1.3 Optical flow computation

    This module is the core of the design. ATW2A and

    ATW2b matrices are, in a first step, singly computed.

    Once done, velocity is obtained by multiplication between

    the 2x2 [ATW2A]1 and the 2x1 [ATW2b] matrix. As forprevious module, a shaping is used to provide a nn matrixof x gradients and an other one of y gradients. Then the ma-

    trix computation of [ATW2A] occurs with multiplicationsand additions. The next stage is the inversion of this 2 2matrix to obtain [ATW2A]1. Hence, a division by deter-miner has to be done. Once u and v computation are done,

    these information are available for post processes.

    4 Hardware Implementation

    4.1 Least Square Matrices Building

    To compute the motion field, two matrices representing

    Eq. 5 and Eq. 6 are needed. To build these matrices, five

    products are required : I2x, IxIy, I2y , IxIt and Iy It. Foreach product, a nn matrix is generated and each element isweighted with a higher weight for the central value, then all

    elements of each matrices are summed. For a computation

    with n x n matrices, hardware cost is given by:

    Memory : 5ImageWitdhDataWidth(n1)bits.

    Multplications : 5 + (5 n2).

    Additions : 5 (n 1)2.For a good tradeoff between accuracy and hardware cost,

    the choice of a 3 3 matrix seems to be the most judicious.Indeed, for example with a 3 3 matrix for a 800 600image resolution and a data width of 20 bits, 120Kbits of

    memory, 50 multipliers and 20 adders are needed against

    240Kbits, 130 and 80 for a 5 5 matrix. Given that thegoal of optical flow computation is to provide critical infor-

    mation to apply other processes, the use of a 5 5 matrixcould compromised the integration of these processes.

    Ix

    Iy

    ItX

    Ix

    Ix.Iy

    Iy

    Ix.It

    Iy.It

    DataShaping

    DataShaping

    DataShaping

    DataShaping

    DataShaping

    Weighting

    Weighting

    Weighting

    Weighting

    Weighting

    Sumofallcomponent

    Sumofallcomponent

    Sumofallcomponent

    Sumofallcomponent

    Sumofallcomponent

    Figure 2. Least Square Matrices Construc-tion.

    4.2 Matrix Inversion and integer to fixed-point conversion unit

    To compute the optical flow, 2 2 matrix representingEq. 5must be inverted. In order to have the same data width

    on the numerator and on the denominator, the determinant iscomputed and truncated. To perform the division, a function

    generated by the MegaWizard plug-in Manager included in

    the software Quartus II provided by Altera is used [13]. Af-

    ter Timming simulation, if the pipelined mode is not used,

    the maximum speed of this block is not enough high what

    is an important bottle neck for the design. Hence, the use of

    the pipelined mode with a latency of one cycle clock is nec-

    essary to keep the flow speed. To keep the accuracy of the

    computation, a division process, shown in Fig. 3, is applied.

    Indeed, the quotient of the first division has only few signifi-

    cant bits due to the local estimation of the flow. Then others

    bits are needed to increase the accuracy, thus the remain of

    the first division is multiplied by 2p, where p represents thenumber of decimal values, and divided by the determiner.

    Numerator

    DenominatorDivider

    Quotient

    Remain

    x2p

    DividerQuotient

    Remain

    Concatenation Result

    Figure 3. Division Flow.

    5 Results

    5.1 Smart Camera SeeMOS

    In order to validate this optical flow estimation, a smart

    camera research platform named SeeMOS is used. The

    SeeMOS architecture is presented in Fig. 4: This architec-

    ture is designed around a FPGA, and more precisely an

    Altera Stratix EP1S60. The FPGA device plays the central

    role in the system, being responsible to interconnect all

    other hardware devices. Surrounding it, 10Mb (5x2) of

  • 8/2/2019 2009dasip_01_2

    4/5

    Figure 4. Architecture of the SeeMOS smartcamera.

    SRAM and 64Mb of SDRAM are available for image and

    other data storage.

    The sensing devices board is composed, among others,of a CMOS imager (LUPA 4000) manufactured by Cy-

    press. This 4 mega-pixel CMOS active pixel sensor features

    synchronous shut- ter and a maximum frame-rate of 15fps

    at full resolution ( 2048x2048 ). The readout speed can be

    boosted by means of sub sampling and windowed Region

    Of Interest (ROI) readout. Dynamic high range scenes can

    be captured using the double and multiple slope functional-

    ity. The dynamic optical range can be up to 67 dB in single

    slope operation and up to 90 dB in multiple slope operation.

    Finally, the communication between the smart camera

    and a PC is realized by the communication board using aFirewire 1394 link. The main clock for IEEE.1394 link

    to send one Byte is 20Mhz. In the case of the presented

    paper, 2 cycle of clock are needed to obtain the velocity

    on one pixel. Thus, for a frame with a resolution of

    500x500, the frame rate obtained is of 40 Frames per

    second. In full resolution ( 2048x2048 ), the frame rate is

    around 2.4 FPS. In order to display the motion field, a C++

    library is used. This library, Cimg [12], receives u and v

    values for each pixel, builds the motion field and displays it.

    [9], [10],[11] give more details about SeeMOS platform.

    5.2 Experiments results

    To discuss about accuracy of the system, the classical set

    of image sequence of the domain has been used. Indeed,

    without this set, accuracy estimation could be impossible

    due to the fact that the real flow of a real scene image se-

    quence is unknown. Thus, Translating Tree and Diverging

    Tree sequence have been loaded into external memories. To

    evaluate the performance of the presented system, two er-

    ror measurement are applied. The first one is the Average

    Angular Error (AAE) Eq. 7. Angular Error is the angle be-

    tween the correct flow vc = (uc,vc,1)T

    u2c+v2c+1

    and the estimated

    flow ve = (ue,ve,1)T

    u2e+v2e+1

    . The second one is the Root Mean

    Square Error (RMS) Eq. 8

    AAE=1

    H.W

    arccos(vc .ve) (7)

    RMS=

    1

    H.W

    ((uc ue)2 + (vc ve)2) (8)

    Where H and W represent the height and the width of the

    image resolution.

    Sequence AAE RMS Density Parameters

    Translating Tree 1.43 0.2 100% = 1Diverging Tree 7 2.1 100% = 1

    Table 2. Translating and Diverging Tree errormeasure.

    Tab. 3 shows synthesis report for an image resolution of

    800 600. As said before, the choice of a 3 3least squarematrix building is the most judicious, indeed, the stage uses

    37% of the FPGA LEs. Optical flow computation reach to

    42% of the device which is no negligible in order to imple-ment post processes using these motion information.

    6 Conclusion and Future Research

    In this paper, a FPGA-based system to compute the

    well-known Lucas and Kanade optical flow algorithm is

    presented. The main goal of this system is to provide

    motion field in real-time. One of the most important aspect

    is the speed of the computation. Indeed, the presented

    processing aims at being embedded on an air robot flying

    over 15 Km/h and so the motion field has to be refreshed at

    high rate.

    The optical flow computation is the first step of a project

    for a flying methodology including time-to-contact estima-

    tion, obstacle avoidance or also autonomous navigation.

    The main goal of the future work is to proposed a recon-

    figurable architecture dedicated to autonomous flying of air

    robot.

    References

    [1] M.V. Correia and A. Campilho, : A pipelined Real-

    Time Optical Flow Algorithm, ICIAR - 2004.

  • 8/2/2019 2009dasip_01_2

    5/5

    Block Number of LEs Memory bits Maximum Clock Frequency

    Data Shaping 39 ( < 1% ) 12768 bits ( < 1% ) 176 MHzGradients Computation 27 ( < 1% ) 0 66 MHzLeast Square Matrices building 21094 ( 37% ) 143280 ( 3% ) 176 MHzMatrix Inversion 2405 ( 4% ) 0 15 MHz

    Optical flow computation 80 ( < 1% ) 0 35 MHz

    Table 3. Synthesis report.

    [2] J. Diaz, E. Ros, F. Pelayo, E.M. Ortigosa and S. Mota,

    : FPGA-based real-time optical-flow system, IEEE

    Trans. Circuits and Systems of Video Technology -

    2006.

    [3] Hiroshima University Robotics Laboratory website :

    http://www.robotics.hiroshima-

    u.ac.jp/hyper human vision/opt flow-e.html

    [4] P.C. Arribas, : Real Time Hardware Vision SystemApplications : Opitcal Flow and Time to Contact De-

    tector Units, IEEE International Caracas Conference

    on Devices, Circuits ans System - 2004.

    [5] M.M. Abulated, A. Hamdy, M.E. Abuelwafa and E.M.

    Saad, : A Reliable FPGA-Based Real-Time Optical-

    flow Estimation, International Journal of Computer

    Systems Science and Engineering - 2008.

    [6] J.L. Baron, D.J. Fleet and S.S. Beauchemin, : Perfor-

    mance of optical flow techniques, International Jour-

    nal of Computer Vision - 1994.

    [7] B.D. Lucas and T. Kanade, : An iterative image regis-

    tration technique with an application to stereo vision,

    Proceedings of the 7th International Joint Conference

    on Artificial Intelligence - 1981.

    [8] K. Nakayama and G. Silverman, : The aperture

    problem-I, Vision Research 28 - 1988.

    [9] P. Chalimbaud and F. Berry, : Embedded active vision

    system based on an FPGA architecture , EURASIP

    Journal on Embedded Systems, - 2007.

    [10] F. Dias, F. Berry, J. Serot and F. Marmoiton, : Hard-

    ware, Design and Implementation Issues on a Fpga-Based Smart Camera , Distributed Smart Cameras,

    2007. ICDSC 07. First ACM/IEEE International Con-

    ference - 2007.

    [11] SeeMOS Project Website:

    http://wwwlasmea.univ-

    bpclermont.fr/Personnel/Francois.Berry/seemos.htm

    [12] Cimg Website:

    http://cimg.sourceforge.net/

    [13] Quartus II divider datasheet:

    http://www.altera.com/literature/ug/ug lpm divide mf.pdf