[IEEE 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) - Seattle, WA, USA (2011.03.21-2011.03.25)] 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops) - Distributed object tracking with robot and disjoint camera networks

Download [IEEE 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) - Seattle, WA, USA (2011.03.21-2011.03.25)] 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops) - Distributed object tracking with robot and disjoint camera networks

Post on 24-Mar-2017




1 download

Embed Size (px)


<ul><li><p>Distributed Object Tracking with Robot and Disjoint Camera Networks</p><p>Junbin Liu #,, Tim Wark #,, Steven Martin #,, Peter Corke #,, Matthew DSouza #</p><p># Autonomous Systems Laboratory, CSIRO ICT CentreBrisbane, Australia</p><p>{jim.liu, tim.wark, matt.dsouza}@csiro.au</p><p> Faculty of Built Environment and Engineering, Queensland University of TechnologyBrisbane, Australia</p><p>{junbin.liu, steven.martin, peter.corke}@qut.edu.au</p><p>AbstractWe describe a novel two stage approach to objectlocalization and tracking using a network of wireless camerasand a mobile robot. In the first stage, a robot travels throughthe camera network while updating its position in a globalcoordinate frame which it broadcasts to the cameras. Thecameras use this information, along with image plane locationof the robot, to compute a mapping from their image planesto the global coordinate frame. This is combined with anoccupancy map generated by the robot during the mappingprocess to track the objects. We present results with a ninenode indoor camera network to demonstrate that this approachis feasible and offers acceptable level of accuracy in terms ofobject locations.</p><p>Keywords-Camera Sensors, Robotics, Object Localization</p><p>I. INTRODUCTION</p><p>The first decade of wireless sensor network researchhas focussed on primarily energy management and reliablemulti-hop radio communication using 8-bit processors sup-porting low-sample-rate scalar sensors and a sample-and-send collection tree paradigm. We are interested in extendingthe paradigm to more complex sensors such as cameras, un-derstanding the network architectural implications of recentlow-power 32-bit processors which allow sophisticated localprocessing of information, synergy and cooperation betweensensor networks and robots.</p><p>In this paper we present a tangible example of a camerasensor network cooperating with a robot in order to localizeobjects moving in a wireless camera network. The approachhas two stages: calibration and localization. In the first stage,the robot travels through the camera network while updatingits position using Simultaneously Localization and Mapping(SLAM) algorithm and constantly broadcasting its estimatedposition in a global coordinate frame. The robot locations arereceived by the cameras and recorded if the robot is in theirfield of view (FoV) along with the corresponding locationon the image plane. When a number of such correspondingworld and image plane coordinates are accumulated thecameras compute the mapping (an homography) between</p><p>their image plane and the ground planes. In the second stagethe calibrated cameras localize objects in the image planewhich they can then map to the global coordinate frame.This is combined, using a particle filter, with an occupancymap generated by the robot to track objects as they movewithin the coverage of the network.</p><p>II. MULTI-CAMERA CALIBRATION</p><p>A. Homography of the Ground Plane</p><p>Most surveillance networks monitor activities that takeplace on a common plane e.g. people walking on aparticular level of a building, cars entering and exiting a carpark etc. In these cases, the perspective effect introducedby projection of 3D world points onto the image plane canbe reduced to a mapping between the ground plane and theimage plane of the camera.</p><p>More formally, the mapping can be expressed as x =HX, where H is the 3 3 homography matrix and x is theimage of a ground point X. Since the H matrix has 8 degreesof freedom, a minimum of four sets of point correspondenceare required between the two planes.</p><p>B. Robot Aided Ground Plane Calibration</p><p>The key requirement for computing the homography ofthe ground plane is to obtain a number of point corre-spondences between the two planes. The real world pointsare provided by the SLAM system of a moving robot. Asthe robot moves in the field of the cameras, it broadcastsits location to nearby cameras. The cameras are capableof performing motion segmentation and blob detection todetermine the location of the robot on the image plane.Once a number of point correspondences are recorded, thehomography is computed using a standard Direct LinearTransformation (DLT) algorithm.</p><p>The greatest advantage of using such a robot-camerasystem is that not only can the homography of individualcameras be computed automatically, the homographies alsoproject points on different image planes onto the same</p><p>Work in Progress workshop at PerCom 2011</p><p>978-1-61284-937-9/11/$26.00 2011 IEEE 380</p></li><li><p>ground plane, thus allowing objects to be localized seam-lessly across different camera views while not assumingoverlapping FoV among cameras.</p><p>III. ARCHITECTUREA. System Overview</p><p>The core problem we address is localization of one ormore objects through a camera network installed in a man-made environment, where the ground can be considered asa flat plane. Locations in the environment can be viewed byzero, one or more cameras. The camera positions, orientationand height above ground are not known. We do not explicitlydetermine the positions and orientation of the cameras, butinstead create camera specific mappings from the camerasimage plane to the global coordinate frame. In order toachieve this, the known trajectory of a ground robot (asdetermined by SLAM) is used to map points within eachcamera FoV to a global ground plane, allowing their groundplane homographies to be determined. The network setup isshown in Figure 1.</p><p>Global 2D ground-plane</p><p>(0,0)</p><p>(x1,y1)</p><p>(x2,y2)</p><p>(x3,y3) (x4,y4)</p><p>Fixed camera FOV projected to ground plane</p><p>Known robot trajectory (derived </p><p>from SLAM)</p><p>Unknown object trajectory</p><p>Figure 1. Illustration of layout of camera nodes and intersection of cameraFoV with robot trajectory.</p><p>B. System Components</p><p>Key software components in the system are spread overthree main classes of devicess: robots, camera nodes and anetwork base-station as shown in Figure 2. Camera nodesare responsible for all tracking within a camera FoV andcommunicating with robot nodes in order to determine theirground-plane homography. Camera nodes also run a collec-tion tree protocol (CTP) stack in order to provide reliablewireless multi-hop transmission of tracking information tothe base station. The robot run a standard SimultaneousLocalization and Mapping (SLAM) algorithm which allowsthem to estimate their positions and derive a global mapof the areas of interest. This map is used to define theglobal ground plane for which camera homographies aredetermined in the calibration phase. Finally, the base-stationruns a particle filter based tracking protocol. Inputs to thepaticle filter include the map (as determined from the robotSLAM) as well as the individual positions in the globalground plane observed from each camera.</p><p>IV. SPARSE OBJECT TRACKING</p><p>A particle filtering process is used to track the currentposition of the robot node using the target location estimatedby the camera nodes and the derived floor-plan map of thetracking area. The particle filtering process uses a MonteCarlo based multi-hypothesis estimation algorithm [1]. Themulti-hypothesis estimation algorithm can be derived fromrecursive Bayesian estimation, as a three stage processconsisting of prediction, correction and resampling.</p><p>We base our approach around the use of RecursiveBayesian Estimation as described in [1] which allows usto estimate the position of an object with a set of weightedsamples. In the prediction stage we calculates a new set ofparticle positions based on object location updates from thecamera nodes and using the previous position and estimatedspeed of the robot node. If there are no updates fromthe camera node, the particle filtering process will use thetargets last known speed and heading value to predict a newposition as:</p><p>(pxpy</p><p>)[i]k</p><p>=</p><p>(pxpy</p><p>)[i]k1</p><p>+ l[i]k </p><p>(sin(</p><p>[i]k</p><p>cos([i]k</p><p>)(1)</p><p>Finally the correction stage determines the validity ofthe particle positions by detecting if any physical barriersare crossed (i.e. walls, etc). In our case we used the mapgenerated by the robot SLAM as an input to the particle filtercorrection stage. Once the particle weight factor values havebeen determined, new particles are created by resampling thecurrent set of particles according to each weight factor.</p><p>V. EVALUATION</p><p>A. Experimental Setup</p><p>The mobile robot we used is an iRobot Create1 researchplatform equipped with an Hokyu scanning laser rangefinder, a laptop computer running Linux and wireless sensornode connected via a USB serial port (Fig 3). The SLAMsoftware, from ROS2, runs on the laptop and uses the sensordata to continuously update a map of the environment andthe robots position. The robot node continually broadcaststhe its position estimate via the sensor node.</p><p>A network of nine low-power wireless cameras [2], [3]were deployed inside of a building at an average spacingof roughly 10m (Fig 3). The network was first calibratedby the autonomously driving iRobot Create running SLAM.During installation, we gained knowledge of the approximateFoV of each camera, which we then used to instruct therobot to visit multiple way points. For each coarsely knowncoverage area, the robot visited and broadcast its positionsat 16 locations, ensuring a high likelihood of at least 10 ofthem being within the actual FoV of each camera. Although</p><p>1http://www.irobot.com/Create2Robot Operating System from ros.org</p><p>381</p></li><li><p>SLAM</p><p>Active Message Broadcast</p><p>Particle Filter</p><p>CTP Receive</p><p>Motion Segmentati</p><p>on, Blob Detection</p><p>Active MessageReceive</p><p>CTP Send</p><p>Ground Plane </p><p>Homography Computation</p><p>Computation of </p><p>Object Locations</p><p>Robot NodeCamera Node</p><p>Base Station</p><p>Robot 2D position </p><p>(x,y)</p><p>Object 2D position </p><p>(x,y), speed,</p><p>size</p><p>Figure 2. Illustration of key software components of system.</p><p>a minimum of 4 pairs of correspondences are needed, weused 10 pairs for improved redundancy. In order to evaluatethe localization performance of the camera network againstground truth data, the same robot was used as the trackingtarget. Although any reasonable size object could have beenused, the SLAM data of the moving robot allowed us toevaluate the accuracy of tracking. The floor plan, the rawrobot locations reported by the cameras and the ground truthlocations is shown in Fig 5.</p><p>Figure 3. The iRobot Create (left) and the deployed cameras (right)</p><p>B. Results and Discussion</p><p>To evaluate the accuracy of our system, we computedthe cumulative distribution function of the error of theunfiltered raw data and the filtered continuous data. Erroris defined as the distance between a measured/estimatedlocation vector and the ground truth location vector at thesame time instance. The results are summarized in Fig 4(a)and Fig 4(b) . It can be seen that the over 80% of the rawtarget locations, reported by the cameras are within 0.4mto the ground truth location, while 80% of the filtered data,generated by the particle filter is associated with an error ofless than 2m. This is expected as the particle filter reportsposition estimates based on previous speed, heading andmap information even when the object is not visible by anycamera.</p><p>To assess the impact of camera density on trackingperformance, we calculated the particle filter response whena varying number of cameras were removed from the net-works. The results are shown in Fig 4(c). The graph is clearlyshowing the trend that as the number of cameras decreases,</p><p>0 0.1 0.2 0.3 0.4 0.50</p><p>0.2</p><p>0.4</p><p>0.6</p><p>0.8</p><p>1</p><p>Absolute Error (m)</p><p>Cumu</p><p>lative</p><p> Dist</p><p>ributi</p><p>on Fu</p><p>nctio</p><p>n (%)</p><p>(a)</p><p>0 0.5 1 1.5 2 2.5 30</p><p>0.2</p><p>0.4</p><p>0.6</p><p>0.8</p><p>1</p><p>Absolute Error (m)</p><p>Cumu</p><p>lative</p><p> Dist</p><p>ributi</p><p>on Fu</p><p>nctio</p><p>n (%)</p><p>(b)</p><p>0 1 2 3 4 5 6 7 8 9 100</p><p>0.1</p><p>0.2</p><p>0.3</p><p>0.4</p><p>0.5</p><p>0.6</p><p>0.7</p><p>0.8</p><p>0.9</p><p>1</p><p>Absolute Error (Meters)</p><p>Cum</p><p>ulat</p><p>ive </p><p>Dis</p><p>tribu</p><p>tion </p><p>Func</p><p>tion </p><p>(Per</p><p>cent</p><p>age)</p><p>Filter data with all 9 camerasFilter data with camera 6 removedFilter data with camera 12, 9 removedFilter data with camera 12, 3, 5 removed</p><p>(c)</p><p>Figure 4. (a) Error of the raw locations reported by the cameras. (b) Errorof the filtered data, output of passing the raw data to the particle filter. Theparticle attempts to predict object locations even in the uncovered regionsand thus the output is associated with larger error. Vertical lines in both(a) and (b) indicate the average error. (c) Error of the filtered data when anumber of cameras are removed.</p><p>the performance of the position estimator degrades quickly.Currently we do not have the statistical significance to beable to make a clear statement about the point in whichincreasing cameras density results in diminishing returns this will be a topic of future work.</p><p>382</p></li><li><p>The localization error of camera 7 in Fig 5 is primarilydue to the fact that the homography was computed usingpoint correspondences that occupied mostly the left half ofthe image. This problem can be corrected if we have a betterpath planned ensuring that points on the right half of theimage are also covered in calibration. We will also providea detailed analysis of errors in the further work.</p><p>X axis (Meters)</p><p>Yax</p><p>is (M</p><p>eters)</p><p>0 10 20 30 40 50 60</p><p>60</p><p>50</p><p>40</p><p>30</p><p>20</p><p>10</p><p>robot loc by camera10robot loc by camera12robot loc by camera6robot loc by camera11robot loc by camera4robot loc by camera9robot loc by camera3robot loc by camera5robot loc by camera7ground truth by SLAM</p><p>Figure 5. Raw object positions computed by the 9 cameras and the groundtruth trajectory. The floor plan was generated by the robot.</p><p>VI. RELATED WORKCamera calibration is a well researched topic in the</p><p>computer vision community. Medeiros et.al. [4] propose adistributed calibration protocol which rely on tracking oneor more moving object to calibrate nearby cameras withoverlapped FoV. In [5], a service mobile robot equipped withplanar patterns collaborates with the camera sensor nodesin the environment and calculates their external parametersby communicating tracking information. However, in orderto distinguish patterns, either high resolution cameras arerequired or the cameras need be mounted close to thepatterns, providing limited coverage area.</p><p>In a number of areas where objects move on a flat surface,calibration can be simplified to computing the homographyof the ground plane. For example, Bose and Grimson [6]present a fully automated technique for both affine andmetric rectification of this ground plane (up to a scale factor)by simply tracking moving objects. Similar approaches canbe found in [7]. The problem with these approaches is thathomographies computed do not share a common coordinateframe if the camera FoVs are disjoint. Another work thatis related to ours can be found in [8], where the authorsdescribe a distributed object tracking algorithm using aKalman filter based approach and cameras with overlappedFoVs. Our system does not rely on overlapping fields of viewof the camera network and is therefore less constrained.</p><p>VII. CONCLUSIONS AND FUTURE WORKWe have described a novel two stage approach to object</p><p>localization using a network of wireless cameras and amobile robot. The robot provides global position data to</p><p>determine local camera image to ground plane mappings, aswell as a global occupancy grid to support a particle filterbased tracking algori...</p></li></ul>