[ieee 2011 ieee international conference on pervasive computing and communications workshops (percom...

Distributed Object Tracking with Robot and Disjoint Camera Networks

Junbin Liu #,∗, Tim Wark #,∗, Steven Martin #,∗, Peter Corke #,∗, Matthew D’Souza #

# Autonomous Systems Laboratory, CSIRO ICT CentreBrisbane, Australia

{jim.liu, tim.wark, matt.d’souza}@csiro.au

∗ Faculty of Built Environment and Engineering, Queensland University of TechnologyBrisbane, Australia

{junbin.liu, steven.martin, peter.corke}@qut.edu.au

Abstract—We describe a novel two stage approach to objectlocalization and tracking using a network of wireless camerasand a mobile robot. In the first stage, a robot travels throughthe camera network while updating its position in a globalcoordinate frame which it broadcasts to the cameras. Thecameras use this information, along with image plane locationof the robot, to compute a mapping from their image planesto the global coordinate frame. This is combined with anoccupancy map generated by the robot during the mappingprocess to track the objects. We present results with a ninenode indoor camera network to demonstrate that this approachis feasible and offers acceptable level of accuracy in terms ofobject locations.

Keywords-Camera Sensors, Robotics, Object Localization

I. INTRODUCTION

The first decade of wireless sensor network researchhas focussed on primarily energy management and reliablemulti-hop radio communication using 8-bit processors sup-porting low-sample-rate scalar sensors and a sample-and-send collection tree paradigm. We are interested in extendingthe paradigm to more complex sensors such as cameras, un-derstanding the network architectural implications of recentlow-power 32-bit processors which allow sophisticated localprocessing of information, synergy and cooperation betweensensor networks and robots.

In this paper we present a tangible example of a camerasensor network cooperating with a robot in order to localizeobjects moving in a wireless camera network. The approachhas two stages: calibration and localization. In the first stage,the robot travels through the camera network while updatingits position using Simultaneously Localization and Mapping(SLAM) algorithm and constantly broadcasting its estimatedposition in a global coordinate frame. The robot locations arereceived by the cameras and recorded if the robot is in theirfield of view (FoV) along with the corresponding locationon the image plane. When a number of such correspondingworld and image plane coordinates are accumulated thecameras compute the mapping (an homography) between

their image plane and the ground planes. In the second stagethe calibrated cameras localize objects in the image planewhich they can then map to the global coordinate frame.This is combined, using a particle filter, with an occupancymap generated by the robot to track objects as they movewithin the coverage of the network.

II. MULTI-CAMERA CALIBRATION

A. Homography of the Ground Plane

Most surveillance networks monitor activities that takeplace on a common plane — e.g. people walking on aparticular level of a building, cars entering and exiting a carpark etc. In these cases, the perspective effect introducedby projection of 3D world points onto the image plane canbe reduced to a mapping between the ground plane and theimage plane of the camera.

More formally, the mapping can be expressed as x =HX, where H is the 3× 3 homography matrix and x is theimage of a ground point X. Since the H matrix has 8 degreesof freedom, a minimum of four sets of point correspondenceare required between the two planes.

B. Robot Aided Ground Plane Calibration

The key requirement for computing the homography ofthe ground plane is to obtain a number of point corre-spondences between the two planes. The real world pointsare provided by the SLAM system of a moving robot. Asthe robot moves in the field of the cameras, it broadcastsits location to nearby cameras. The cameras are capableof performing motion segmentation and blob detection todetermine the location of the robot on the image plane.Once a number of point correspondences are recorded, thehomography is computed using a standard Direct LinearTransformation (DLT) algorithm.

The greatest advantage of using such a robot-camerasystem is that not only can the homography of individualcameras be computed automatically, the homographies alsoproject points on different image planes onto the same

Work in Progress workshop at PerCom 2011

978-1-61284-937-9/11/$26.00 ©2011 IEEE 380

ground plane, thus allowing objects to be localized seam-lessly across different camera views while not assumingoverlapping FoV among cameras.

III. ARCHITECTURE

A. System Overview

The core problem we address is localization of one ormore objects through a camera network installed in a man-made environment, where the ground can be considered asa flat plane. Locations in the environment can be viewed byzero, one or more cameras. The camera positions, orientationand height above ground are not known. We do not explicitlydetermine the positions and orientation of the cameras, butinstead create camera specific mappings from the camera’simage plane to the global coordinate frame. In order toachieve this, the known trajectory of a ground robot (asdetermined by SLAM) is used to map points within eachcamera FoV to a global ground plane, allowing their groundplane homographies to be determined. The network setup isshown in Figure 1.

Global 2D ground-plane

(0,0)

(x1,y1)

(x2,y2)

(x3,y3) (x4,y4)

Fixed camera FOV projected to ground plane

Known robot trajectory (derived

from SLAM)

Unknown object trajectory

Figure 1. Illustration of layout of camera nodes and intersection of cameraFoV with robot trajectory.

B. System Components

Key software components in the system are spread overthree main classes of devicess: robots, camera nodes and anetwork base-station as shown in Figure 2. Camera nodesare responsible for all tracking within a camera FoV andcommunicating with robot nodes in order to determine theirground-plane homography. Camera nodes also run a collec-tion tree protocol (CTP) stack in order to provide reliablewireless multi-hop transmission of tracking information tothe base station. The robot run a standard SimultaneousLocalization and Mapping (SLAM) algorithm which allowsthem to estimate their positions and derive a global mapof the areas of interest. This map is used to define theglobal ground plane for which camera homographies aredetermined in the calibration phase. Finally, the base-stationruns a particle filter based tracking protocol. Inputs to thepaticle filter include the map (as determined from the robotSLAM) as well as the individual positions in the globalground plane observed from each camera.

IV. SPARSE OBJECT TRACKING

A particle filtering process is used to track the currentposition of the robot node using the target location estimatedby the camera nodes and the derived floor-plan map of thetracking area. The particle filtering process uses a MonteCarlo based multi-hypothesis estimation algorithm [1]. Themulti-hypothesis estimation algorithm can be derived fromrecursive Bayesian estimation, as a three stage processconsisting of prediction, correction and resampling.

We base our approach around the use of RecursiveBayesian Estimation as described in [1] which allows usto estimate the position of an object with a set of weightedsamples. In the prediction stage we calculates a new set ofparticle positions based on object location updates from thecamera nodes and using the previous position and estimatedspeed of the robot node. If there are no updates fromthe camera node, the particle filtering process will use thetarget’s last known speed and heading value to predict a newposition as:

(pxpy

)[i]

k

=

(pxpy

)[i]

k−1

+ l̂[i]k ·

(sin(φ̂

[i]k

cos(φ̂[i]k

)(1)

Finally the correction stage determines the validity ofthe particle positions by detecting if any physical barriersare crossed (i.e. walls, etc). In our case we used the mapgenerated by the robot SLAM as an input to the particle filtercorrection stage. Once the particle weight factor values havebeen determined, new particles are created by resampling thecurrent set of particles according to each weight factor.

V. EVALUATION

A. Experimental Setup

The mobile robot we used is an iRobot Create1 researchplatform equipped with an Hokyu scanning laser rangefinder, a laptop computer running Linux and wireless sensornode connected via a USB serial port (Fig 3). The SLAMsoftware, from ROS2, runs on the laptop and uses the sensordata to continuously update a map of the environment andthe robot’s position. The robot node continually broadcaststhe its position estimate via the sensor node.

A network of nine low-power wireless cameras [2], [3]were deployed inside of a building at an average spacingof roughly 10m (Fig 3). The network was first calibratedby the autonomously driving iRobot Create running SLAM.During installation, we gained knowledge of the approximateFoV of each camera, which we then used to instruct therobot to visit multiple way points. For each coarsely knowncoverage area, the robot visited and broadcast its positionsat 16 locations, ensuring a high likelihood of at least 10 ofthem being within the actual FoV of each camera. Although

1http://www.irobot.com/Create2Robot Operating System from ros.org

381

SLAM

Active Message Broadcast

Particle Filter

CTP Receive

Motion Segmentati

on, Blob Detection

Active MessageReceive

CTP Send

Ground Plane

Homography Computation

Computation of

Object Locations

Robot NodeCamera Node

Base Station

Robot 2D position

(x,y)

Object 2D position

(x,y), speed,

size

Figure 2. Illustration of key software components of system.

a minimum of 4 pairs of correspondences are needed, weused 10 pairs for improved redundancy. In order to evaluatethe localization performance of the camera network againstground truth data, the same robot was used as the trackingtarget. Although any reasonable size object could have beenused, the SLAM data of the moving robot allowed us toevaluate the accuracy of tracking. The floor plan, the rawrobot locations reported by the cameras and the ground truthlocations is shown in Fig 5.

Figure 3. The iRobot Create (left) and the deployed cameras (right)

B. Results and Discussion

To evaluate the accuracy of our system, we computedthe cumulative distribution function of the error of theunfiltered raw data and the filtered continuous data. Erroris defined as the distance between a measured/estimatedlocation vector and the ground truth location vector at thesame time instance. The results are summarized in Fig 4(a)and Fig 4(b) . It can be seen that the over 80% of the rawtarget locations, reported by the cameras are within 0.4mto the ground truth location, while 80% of the filtered data,generated by the particle filter is associated with an error ofless than 2m. This is expected as the particle filter reportsposition estimates based on previous speed, heading andmap information even when the object is not visible by anycamera.

To assess the impact of camera density on trackingperformance, we calculated the particle filter response whena varying number of cameras were removed from the net-works. The results are shown in Fig 4(c). The graph is clearlyshowing the trend that as the number of cameras decreases,

0 0.1 0.2 0.3 0.4 0.50

0.2

0.4

0.6

0.8

1

Absolute Error (m)

Cumu

lative

Dist

ributi

on Fu

nctio

n (%)

(a)

0 0.5 1 1.5 2 2.5 30

0.2

0.4

0.6

0.8

1

Absolute Error (m)

Cumu

lative

Dist

ributi

on Fu

nctio

n (%)

(b)

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Absolute Error (Meters)

Cum

ulat

ive

Dis

tribu

tion

Func

tion

(Per

cent

age)

Filter data with all 9 camerasFilter data with camera 6 removedFilter data with camera 12, 9 removedFilter data with camera 12, 3, 5 removed

(c)

Figure 4. (a) Error of the raw locations reported by the cameras. (b) Errorof the filtered data, output of passing the raw data to the particle filter. Theparticle attempts to predict object locations even in the uncovered regionsand thus the output is associated with larger error. Vertical lines in both(a) and (b) indicate the average error. (c) Error of the filtered data when anumber of cameras are removed.

the performance of the position estimator degrades quickly.Currently we do not have the statistical significance to beable to make a clear statement about the point in whichincreasing cameras density results in diminishing returns —this will be a topic of future work.

382

The localization error of camera 7 in Fig 5 is primarilydue to the fact that the homography was computed usingpoint correspondences that occupied mostly the left half ofthe image. This problem can be corrected if we have a betterpath planned ensuring that points on the right half of theimage are also covered in calibration. We will also providea detailed analysis of errors in the further work.

X axis (Meters)

Yax

is (M

eters)

0 10 20 30 40 50 60

60

50

40

30

20

10

robot loc by camera10robot loc by camera12robot loc by camera6robot loc by camera11robot loc by camera4robot loc by camera9robot loc by camera3robot loc by camera5robot loc by camera7ground truth by SLAM

Figure 5. Raw object positions computed by the 9 cameras and the groundtruth trajectory. The floor plan was generated by the robot.

VI. RELATED WORK

Camera calibration is a well researched topic in thecomputer vision community. Medeiros et.al. [4] propose adistributed calibration protocol which rely on tracking oneor more moving object to calibrate nearby cameras withoverlapped FoV. In [5], a service mobile robot equipped withplanar patterns collaborates with the camera sensor nodesin the environment and calculates their external parametersby communicating tracking information. However, in orderto distinguish patterns, either high resolution cameras arerequired or the cameras need be mounted close to thepatterns, providing limited coverage area.

In a number of areas where objects move on a flat surface,calibration can be simplified to computing the homographyof the ground plane. For example, Bose and Grimson [6]present a fully automated technique for both affine andmetric rectification of this ground plane (up to a scale factor)by simply tracking moving objects. Similar approaches canbe found in [7]. The problem with these approaches is thathomographies computed do not share a common coordinateframe if the camera FoVs are disjoint. Another work thatis related to ours can be found in [8], where the authorsdescribe a distributed object tracking algorithm using aKalman filter based approach and cameras with overlappedFoVs. Our system does not rely on overlapping fields of viewof the camera network and is therefore less constrained.

VII. CONCLUSIONS AND FUTURE WORK

We have described a novel two stage approach to objectlocalization using a network of wireless cameras and amobile robot. The robot provides global position data to

determine local camera image to ground plane mappings, aswell as a global occupancy grid to support a particle filterbased tracking algorithm for objects in the environment.Results have been presented using a nine node indoor cameranetwork that show acceptable levels of object location accu-racy once ground-plane homographies have been determinedduring the robot-camera calibration phase.

Currently much of the computation is done centrally inthe base station, although the algorithm can be decentralized.Our future work will show decentralized operation and howrobotic cameras can be integrated into the framework inorder to cover areas of particular interest. We will alsoinvestigate ways in which prediction of the likely path ofsparse objects will allow us to adaptively duty-cycle cameranodes to allow us to greatly reduce their energy consumptionwhile maintaining system tracking performance.

REFERENCES

[1] L. Klingbeil and T. Wark, “A wireless sensor network forreal-time indoor localisation and motion monitoring,” in In-formation Processing in Sensor Networks, 2008. IPSN ’08.International Conference on, 2008, pp. 39–50.

[2] T. Wark, P. Corke, J. Liu, and D. Moore, “Design andevaluation of an image analysis platform for low-power, low-bandwidth camera networks,” in Workshop on Applications,Systems, and Algorithms for Image Sensing 2008, 2008.

[3] D. O’Rourke, R. Jurdak, J. Liu, D. Moore, and T. Wark,“On the feasibility of using servo-mechanisms in wirelessmultimedia sensor network deployments,” in The 4th IEEEInternational Workshop on Practical Issues In Building SensorNetwork Applications (SenseApp 2009) held in conjunctionwith LCN. Zurich, Switzerland: IEEE Press, 2009, pp. 826–833.

[4] H. Medeiros, H. Iwaki, and J. Park, “Online distributed cali-bration of a large network of wireless cameras using dynamicclustering,” in Distributed Smart Cameras, 2008. ICDSC 2008.Second ACM/IEEE International Conference on, 2008, pp. 1–10.

[5] D. Meger, D. Marinakis, I. Rekleitis, and G. Dudek, “Inferringa probability distribution function for the pose of a sensornetwork using a mobile robot,” in Robotics and Automation,2009. ICRA ’09. IEEE International Conference on, 2009, pp.756–762.

[6] B. Bose and E. Grimson, “Ground plane rectification bytracking moving objects,” in Proceedings of the Joint IEEE In-ternational Workshop on Visual Surveillance and PerformanceEvaluation of Tracking and Surveillance, 2003.

[7] Z. Zhaoxiang, M. Li, K. Huang, and T. Tan, “Robust automatedground plane rectification based on moving vehicles for trafficscene surveillance,” in Image Processing, 2008. ICIP 2008.15th IEEE International Conference on, 2008, pp. 1364–1367.

[8] J. M. Sanchez-Matamoros, J. R. M. d. Dios, and A. Ollero,“Cooperative localization and tracking with a camera-basedwsn,” in Mechatronics, 2009. ICM 2009. IEEE InternationalConference on, 2009, pp. 1–6.

383

[ieee 2011 ieee international conference on pervasive computing and communications workshops (percom...

Documents