bachelor thesis - aneesh sharma

8/8/2019 Bachelor Thesis - Aneesh Sharma

1/32

BACKGROUND MODELING USING PAN-TILTCAMERA

by

Aneesh Sharma

Rahul Singhal

COMMUNICATION AND COMPUTER Engineering

LNM INSTITUTE OF INFORMATION TECHNOLOGY, JAIPUR

May 2010


2/32

CERTIFICATE

It is certified that the work contained in the B.Tech. Project entitled Background

modeling using Pan-Tilt camera by Aneesh Sharma (Y06UC012) and Rahul

Singhal (Y06UC089) has been carried out under my supervision and that this work

has not been submitted elsewhere for a degree.

May, 2010 Prithwijit Guha

Visiting Faculty

Communication & Computer Engg.

LNM Institute of Information Technology,

Jaipur, Rajasthan


3/32

Abstract

The use of autonomous pan-tilt cameras as opposed to static cameras can dramat-

ically enhance the range and effectiveness of surveillance systems, but effective

tracking in such pan-tilt scenarios remains a challenge. Existing approaches for con-

structing background models fails here since they are designed for static cameras.

In this paper we estimate camera motion parameters, and use it to update the back-

ground model online in the presence of scene activity. Camera motion is estimated

as the median learned over Flow matrix between consecutive frames. Foreground

regions are detected as changes using single Gaussian model, and thus are skipped

during background model construction.


4/32

Dedicated to our parents


5/32

5

Acknowledgments

We would like to thank all our colleagues for keeping us sane, to our parents for

supporting us through all of this and to Mr. Prithwijit Guha for their support and

their valuable suggestions, without which we were not been able to carry out this

work.


6/32

Contents

1 Introduction 1

2 Foreground Extraction 4

3 Inter-Frame Motion Estimation 7

3.1 Estimation of optical flow . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Algorithms for calculating optical flow . . . . . . . . . . . . . . . . . . 9

3.2.1 Horn Schunck . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2.2 Lucas Kanade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.3 Pyramidal Lucas Kanade . . . . . . . . . . . . . . . . . . . . . 13

4 Mosaiced Background Model 15

4.1 Flow estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Foreground Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.1 Single Gaussian Model . . . . . . . . . . . . . . . . . . . . . . . 18

4.3 Background Stitching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 Conclusion 22

6


7/32

List of Figures

1.1 Stationary camera vs. Pan tilt camera . . . . . . . . . . . . . . . . . . . 3

1.2 Flowchart of the PT camera based background modeling . . . . . . . 3

2.1 Foreground detection output . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Foreground detection output . . . . . . . . . . . . . . . . . . . . . . . 6

3.1 Horn Shunk Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Lucas Kanade Output . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Lucas Kanade Pyramidal Output . . . . . . . . . . . . . . . . . . . . . 14

4.1 Block Diagram of Proposed Algorithm . . . . . . . . . . . . . . . . . . 16

4.2 Flow density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 LK Pyramidal Flow calculated on test case . . . . . . . . . . . . . . . 20

4.4 Single Gaussian output on test case . . . . . . . . . . . . . . . . . . . . 20

4.5 Final Mosaiced image of test case . . . . . . . . . . . . . . . . . . . . . 21

7


8/32

Chapter 1

Introduction

Active video surveillance involves using several static cameras so as to have a bet-

ter perception of the situation [3]. Such kind of system is quite expensive and not

feasible in some scenarios. The use of active Pan-Tilt (PT) cameras in such scenario,

reduces the actual number of cameras required for monitoring a certain environ-ment. During operation, each PT camera provides a high virtual resolution over a

large area, which can potentially track activities over a large area and capture high-

resolution imagery around the tracked objects. Pan-tilt motion of the camera not

only increases the viewpoint as compared to the static-camera but it also improves

the coverage and performance. In addition to surveillance scenarios, such systems

are also of relevance for visual systems for car driving and mobile robot navigation,

where an active vision system can quickly construct a scene background model andinterpret agents and activity in the scene [4].

A network of such active cameras could be used for modeling of large scenes

and reconstruction of events and activities within a large area. Pan-tilt cameras as

opposed to static cameras enhances the range and effectiveness of surveillance sys-

tems [2], but at the same time background modeling becomes an issue. All con-

ventional background models for static cameras do not use the knowledge of the

1


9/32

CHAPTER 1. INTRODUCTION 2

inter-frame motion, and thus it fails to segment the foreground effectively in case of

relative motion between camera and objects.

In this report, we construct a background model which uses pan-tilt cam-

era in an environment with some foreground activities. The algorithm introduces

new grounds which combines the concepts from both inter-frame motion and back-

ground model for static camera. In first step, various flow algorithms are imple-

mented to get the actual shift between the frames. Relative motion between camera

and object is estimated from the histogram median drawn over flow vectors be-tween consecutive frames. And then using this flow value a panoramic model is cre-

ated with each pixel containing a single gaussian (SG) model. The SG model gives

us change detection output which effectively segments foreground objects from the

scenario. The change detection output is used to update background model which

further help us in the creation of mosaiced image.


10/32

CHAPTER 1. INTRODUCTION 3

Figure 1.1: Stationary camera Vs Pan tilt camera. A number of stationary cameras are

required to monitor a wide area, while a single pan-tilt camera is sufficient for the task.

Note the lines showing the Field of view

Figure 1.2: Flowchart of the pan-tilt camera based background modeling.


11/32

Chapter 2

Foreground Extraction

Foreground object segmentation is one of the most challenging issues in computer

vision. The task of Foreground object segmentation is to extract moving targets of

interest from given image sequences. It is an essential step in lots of computer vision

applications such as visual surveillance, 3D motion capture and human-computerinteractions. High-quality segmentation can benefit the further processing and in-

fluence the overall performance of the whole system greatly [7]. However, because

of many uncontrollable factors in the actual acquisition environment such as cam-

era movement, light changes, shadows and complex background, fast and accurate

moving target extraction becomes a very difficult problem and makes this topic a

challenging research field in computer vision.

In most applications, cameras have fixed position and parameters. So it

is possible for us to have a stable background model for these applications. Sim-

ple background subtraction is the most straightforward method for segmentation

in such kind of situation. The background subtraction method calculates the differ-

ences of each pixel between current image and the background image, and then it

detects the moving object through a threshold value. To reflect the temporal changes

of the background, the statistical background subtraction approaches use the sta-

4


12/32

CHAPTER 2. FOREGROUND EXTRACTION 5

tistical characteristics of individual pixels to construct more accurate background

models. For this, single Gaussian model is considered as a sufficient approxima-

tion to practical pixel changing process due to its ability to efficiently segment fore-

ground.

Single Gaussian model is an effective background model to approximate the

variation in the presence of any particular pixel. It maintains mean (k) and variance

(k) corresponding to each pixel k in image I. It works on a per-pixel basis, and tends

to produce output that contains segmented foreground objects such as person, car,etc. The segmentation result run on sample data set is shown in 2.2.

In case of pan-tilt cameras, the Single Gaussian modeling technique that is

used for static cameras completely fails over here because of the relative motion be-

tween camera and object. In order to construct a background model, we need to

estimate the flow between the consecutive frames. The main reason for this situ-

ation is due to the fact that the Single Gaussian Model does not use the knowledge

of the inter-frame motion, and thus it fails to segment the foreground effectively.

This problem exists in all of the conventional background subtraction methods. In

order to solve this problem, we need to introduce the concept of inter-frame motion

estimation which is discussed in the next chapter.


13/32

CHAPTER 2. FOREGROUND EXTRACTION 6

Figure 2.1: single Gaussian model output on sample dataset1.

Figure 2.2: single Gaussian model output on sample dataset2.


14/32

Chapter 3

Inter-Frame Motion Estimation

Optical flow or optic flow is the pattern of apparent motion of objects, surfaces,

and edges in a visual scene caused by the relative motion between an observer (an

eye or a camera) and the scene. Optical flow techniques such as motion detection,

object segmentation, time-to-collision and focus of expansion calculations, motioncompensated encoding, and stereo disparity measurement utilize this motion of the

objects surfaces, and edges.

3.1 Estimation of optical flow

Sequences of ordered images allow the estimation of motion as either instantaneous

image velocities or discrete image displacement. The optical flow methods try to

calculate the motion between two image frames which are taken at times t and t +

t at every pixel position. These methods are called differential since they are based

on local Taylor series approximations of the image signal; that is, they use partial

derivatives with respect to the spatial and temporal coordinates.

For a 2D+t dimensional case (3D or n-D cases are similar) a pixel at location

7


15/32

CHAPTER 3. INTER-FRAME MOTION ESTIMATION 8

(x,y,t) with intensity I(x,y,t) will have moved by x, y and t between the two image

frames, and the following image constraint equation can be given:

I(x,y,t) = I(x + x,y + y,t + t) (3.1)

Assuming the movement to be small, the image constraint at I(x,y,t) with Taylor

series can be developed to get:

I(x + x,y + y,t + t) = I(x,y,t) +I

x x +I

y y +I

t t + . . .

+Higher Order T erms . . . (3.2)

From these equations it follows that:

I

xx +

I

yy +

I

tt = 0 (3.3)

=

I

x

x

t +

I

y

y

t +

I

t = 0 (3.4)

which results inI

xvx +

I

yvy +

I

t= 0 (3.5)

where vx,vy are the x and y components of the velocity or optical flow of I(x,y,t)

and Ix

, Iy

and It

are the derivatives of the image at (x,y,t) in the corresponding

directions. Ix, Iy and It can be written for the derivatives in the following. Thus,

Ixvx + Iyvy = It = (I)TV = It (3.6)

This is an equation in two unknowns and cannot be solved as such. This is

known as the aperture problem of the optical flow algorithms. To find the optical

flow another set of equations is needed, given by some additional constraint. All

optical flow methods introduce additional conditions for estimating the actual flow

[5].


16/32


3.2 Algorithms for calculating optical flow

There are some general algorithm for calculating optical flows.

1. Horn Schunck [1]

2. Lucas Kanade [6]

3. Pyramidal Lucas Kanade

3.2.1 Horn Schunck

The Horn-Schunck method of estimating optical flow is a global method which in-

troduces a global constraint of smoothness to solve the aperture problem.

It assumes smoothness in the flow over the whole image. Thus, it tries to

minimize distortions in flow and prefers solutions which show more smoothness.

The flow is formulated as a global energy functional which is then sought to be

minimized. This function is given for two-dimensional image streams as:

E =

(Ixu + Iyv + It)

2 + 2(|u|2 + |v|2) (3.7)

where Ix, Iy and It are the derivatives of the image intensity values along the x, y

and time dimensions respectively, V = [u, v]T is the optical flow vector, and the

parameter is a regularization constant. Larger values of lead to a smoother

flow. This functional can be minimized by solving the associated Euler-Lagrange

equations. These are

L

u

x

L

ux

y

L

uy

= 0 (3.8)

L

v

x

L

vx

y

L

vy

= 0 (3.9)

where L is the integrand of the energy expression, giving


17/32


Ix(Ixu + Iyv + It) 2u = 0 (3.10)

Iy(Ixu + Iyv + It) 2v = 0 (3.11)

where subscripts again denote partial differentiation and = 2

x2+

2

y2

denotes the Laplace operator. In practice the Laplacian is approximated numeri-

cally using finite differences, and may be written u(x, y) = u(x, y) u(x, y) where

u(x, y) is a weighted average ofu calculated in a neighborhood around the pixel

at location (x,y).

However, since the solution depends on the neighboring values of the flow field, it

must be repeated once the neighbors have been updated. The following iterative

scheme is derived:

uk+1 = uk IxIxu

k + Iyvk + It

2 + I2x + I2y

(3.12)

vk+1 = vk IyIxu

k + Iyvk + It

2

+I2x +

I2y

(3.13)

where the superscript k+1 denotes the next iteration, which is to be calcu-

lated and k is the last calculated result. This is in essence the Jacobi method applied

to the large, sparse system arising when solving for all pixels simultaneously.

Properties: Advantages of the Horn-Schunck algorithm include that it yields a high

density of flow vectors, i.e. the flow information missing in inner parts of homo-

geneous objects is filled in from the motion boundaries. On the negative side, it is

more sensitive to noise than local methods.

3.2.2 Lucas Kanade

In computer vision, the Lucas-Kanade method is a two-frame differential method

for optical flow estimation developed by Bruce D. Lucas and Takeo Kanade. It intro-


18/32


Figure 3.1: Horn Schunck output showing flow vectors in x and y direction.

duces an additional term to the optical flow by assuming the flow to be constant in a

local neighborhood around the central pixel under consideration at any given time.

The Lucas-Kanade method is still one of the most popular versions of two-

frame differential methods for motion estimation (which is also called optical flow

estimation). The solution assumes a locally constant flow. The method is based

upon the Optical Flow equation. The additional constraint needed for the estimation

of the flow field is introduced in this method by assuming that the flow (vx, vy) is

constant in a small window of size mm.. with m > 1, which is centered at Pixel x,y

and numbering the pixels within as 1 . . . n, n = m2, a set of equations can be found

as


19/32


20/32


3.2.3 Pyramidal Lucas Kanade

As generally more equations are available for flow estimation than needed (over

determined system) the Lucas-Kanade algorithm can be used in combination with

statistical methods to improve the performance in presence of outliers as in noisy

images. A statistical analysis marks the outliers and the flow is then estimated based

on the remaining equations or weighted accordingly.

When applied to image registration, such as stereo matching or images with

large displacements, the Lucas-Kanade method is usually carried out in a coarse-to-

fine iterative manner, in such a way that the spatial derivatives are first computed at

a coarse scale in scale-space (or a pyramid), one of the images is warped by the com-

puted deformation, and iterative updates are then computed at successively finer

scales.One of the characteristics of the Lucas-Kanade algorithm, and that of other

local optical flow algorithms, is that it does not yield a very high density of flow

vectors, i.e. the flow information fades out quickly across motion boundaries and

the inner parts of large homogenous areas show little or no motion. Its advantage is

the comparative robustness in presence of noise.

IL(x, y) =1

16(IL1(2x 1, 2y 1) + IL1(2x + 1, 2y 1) + IL1(2x 1, 2y + 1)) +

1

8(IL1(2x 1, 2y) + IL1(2x + 1, 2y) + IL1(2x, 2y 1) + IL(2x, 2y + 1)) +

1

4(IL1(2x, 2y)) +

1

16(IL1(2x + 1, 2y + 1)) (3.16)

Equation 3.16 defines the pyramid representation of a generic image I of

size nx ny. Let I0 = I be the zeroth level image. This image is essentially the

highest resolution image (the raw image). The image width and height at that level

are defined as (x0 = xn) and (y0 = yn). The pyramid representation is then built in

a recursive fashion: compute I1 from I0, then compute I2 from I1, and so on . . .. Let

L = 1, 2, . . . be a generic pyramidal level, and let IL1 be the image at level L-1.


21/32


Comparisons

LK method dominates over HS in terms of performance and complexity.

LK pyramidal method is quite efficient than HS in terms of time taken for

computation.

Here in LK, whole image is broken down into different layers (pyramids).

LK pyramidal uses features to estimate the optical flow between frames.

Figure 3.3: Lucas Kanade Pyramidal output showing flow vectors in x and y direction.


22/32

Chapter 4

Mosaiced Background Model

This chapter deals with the efficient background model construction in case of pan-

tilt cameras. As there is relative motion between the camera and objects, conven-

tional methods of background modeling fails over here. The proposed algorithm

combines the concepts from conventional background model and the inter-framemotion. The proposed algorithm can be classified in following stages.

Flow estimation between frames

Segmentation of Foreground from background

Learning Background model

4.1 Flow estimation

Flow estimation gives us complete view of the relative shift between camera and ob-

ject. Thus help us in construction of background model. Flow value between the two

consecutive frames gives the idea of overlapping region and non-overlapping re-

gion.There are various existing methods for calculating flow between frames (chap-

15


23/32

CHAPTER 4. MOSAICED BACKGROUND MODEL 16

Figure 4.1: step by step method of background modeling.

Figure 4.2: Flow density with x and y axis showing flows in different directions.

ter 3). Among these methods, Lucas Kanade pyramidal approach is the effective

and robust way to compute flow between frames. As it is usually carried out in


24/32


a coarse-to-fine iterative manner, in such a way that the spatial derivatives are first

computed at a coarse scale in scale-space (or a pyramid), one of the images is warped

by the computed deformation, and iterative updates are then computed at succes-

sively finer scales.One of the characteristics of the Lucas Kanade algorithm, and that

of other local optical flow algorithms, is that it does not yield a very high density

of flow vectors, i.e. the flow information fades out quickly across motion bound-

aries and the inner parts of large homogenous areas show little or no motion. Its

advantage is the comparative robustness in presence of noise.

4.2 Foreground Segmentation

Foreground segmentation is a fundamental first processing stage for vision sys-

tems which monitor real-world activity being of great interest in many applica-

tions. For instance, In the real time scenario moving objects such as human being

can be separated out from stationary/non-living objects. segmentation also allows

people to learn background effectively and led us to the efficient realization of 3-D

world. In 3D multi-camera environments, robust foreground segmentation allows

a correct 3-dimensional reconstruction without background artifacts while, in video

surveillance tasks, foreground segmentation allows a correct object identification

and tracking.

The objective of a foreground segmentation and Tracking is to segment the

scene in foreground objects and background and establish the temporal correspon-

dence of the foreground objects. In this report we will focus on techniques that are

based on a classification using a statistical model of the background and the fore-

ground. For this reason, we will assume that the first frame as a reference consisting

of only background objects. Our objective will be to improve the models and define

an appropriate updating of these models to reach a correct foreground-background


25/32


segmentation minimizing False Negatives and False Positives. The tracking process

makes the correspondence of the segmented objects with the objects being tracked

from previous frames. Depending on the technique, the tracking can be clearly sep-

arated from the segmentation (when previous foreground information is not used

for the segmentation) or can be implicit in the foreground segmentation (when we

are using a priori information of the object).

The approach used here is foreground segmentation based on background

modeling. This is a common technique proposed to use a background model todetect foreground regions as a background exception. The background model itself

is dynamically updated.

4.2.1 Single Gaussian Model

We modeled the background by analyzing each pixel (i,j) of the image. The back-

ground model consists of mean and variance corresponding to each pixel value. In

Figure, it is shown an image with the system idea, where each pixel appear modeled

with a Gaussian distribution.

Mean and variance for each frame (t, 2) are updated as follows:

t = It + (1 )It1 (4.1)

2t = d2 + (1 )(t1)

2 (4.2)

d = (It t)2 (4.3)

where It is the value of pixel under analysis in the current frame, t and 2t are the

mean and variance of the Gaussian distribution respectively, is the rate of update

which we have chosen as 0.03.

This updating step allows a background model evolution, making it robust

to soft illumination changes, a common situation in outdoor scenarios. For each


26/32


frame, the pixel value is classified as foreground according to Equation :

|It t| > kt (4.4)

4.3 Background Stitching

This is the final step of processing, which led to the construction of mosaiced im-

age of whole background. At this stage first of all, a background model of twice

the height and thrice the width of regular frame is constructed, assuming that the

maximum shift along x and y direction would be always less than the width and

half of the height respectively. The first frame in the data sets are taken as a ref-

erence and learnt as it is. The next consecutive frames, each are classified as over-

lapping and non-overlapping regions. The non-overlapping regions or new regions

are learned as it is in the background model while overlapping regions are passed

thorough foreground extraction process and are learned in background model withforeground objects skipped out. Background Model is updated dynamically at each

pixel position using its mean and variance. Finally, the output is mosaiced image

consisting of background objects with left out foreground parts.


27/32


Figure 4.3: LK Pyramidal flow output.

Figure 4.4: SG Model output. Black part represents uninitialized region


28/32


Figure 4.5: Mosaic output of test case. Black represents uninitialized region


29/32

Chapter 5

Conclusion

This report extends work on mosaic-based background modeling using non-stationary

cameras in several novel ways. All conventional background models do not use the

knowledge of the inter-frame motion, and thus they fail to segment the foreground

effectively in case of relative motion between camera and objects. The algorithm in-troduces new grounds which combines the concepts from both inter-frame motion

and background model for static camera.

The proposed algorithm works in following steps:

Determining the camera motion using LK based Pyramidal approach,

Segmenting foreground from background using single Gaussian model,

Learning the mosaiced background online.

The pan-tilt camera samples sub-regions of a larger scene whose background model

mosaic is obtained by stitching the pixel-wise intensity distributions of the back-

ground regions. We estimate flow as the median value of the flow vector computed

using LK pyramidal approach. The flow value gives an estimate shift between ob-

ject and camera. The flow values in x and y directions are combined with the single

22


30/32

CHAPTER 5. CONCLUSION 23

Gaussian modeling to generate an online background learning model for pan-tilt

cameras. The modified single Gaussian model uses flow values to get the region of

learning which means regions segmented as foreground are skipped during back-

ground learning process. Thus, objects moving within frame are most likely to be

skipped during background construction process.

However, this is clearly a very initial approach in the area of surveillance

system. One of the main areas which we would like to work upon is the area of

foreground detection process. Our approach uses single Gaussian model whichproduces unstable and error-prone results, including false holes in detected objects

caused by camera noise. Single Gaussian model does not use the prior knowledge

between the pixels which seems the obvious reason for noisy output. Because of the

shortcomings of simple background subtraction method mentioned above, other al-

ternative like GMM (Gaussian Mixture Model) will greatly enhance the working of

the algorithm [8]. GMM is a probabilistic model for density estimation using a mix-

ture distribution. They introduces the concept of statistics in single gaussian model.

Since GMM provides very good performances and interesting properties as a clas-

sifier, our future work thrives towards the implementation of GMM to efficiently

detect the foreground.


31/32

Bibliography

[1] Brian G. Schunck Berthold K.P. Horn. Determining optical flow. In Artificial

Intelligence, volume 17, 1981.

[2] Arindam Biswas, Prithwijit Guha, Amitabha Mukerjee, and K.S. Venkatesh. In-

trusion detection and tracking with pan-tilt cameras. In Proceedings of the Third

International Conference on Visual Information Engineering, 2006.

[3] A. F. Bobick. Movement, activity, and action: The role of knowledge in the

perception of motion. Philosophical Transactions Royal Society London B, 1997.

[4] D. Gutchess, M. Trajkovics, E. Cohen-Solal, D. Lyons, and A.K. Jain. A back-

ground model initialization algorithm for video surveillance. In Eighth IEEE

International Conference on Computer Vision, volume 1, pages 733740, July 2001.

[5] D.J. Fleet J.L. Barron and S. Beauchemin. Performance of optical flow techniques.

In International Joint Conferences on Artificial Intelligence, volume 12, pages 4377,

1994.

[6] B.D. Lucas and T. Kanade. An iterative image registration technique with an ap-

plication to stereo vision. In International Joint Conferences on Artificial Intelligence,

pages 674679, 1981.

24


32/32

BIBLIOGRAPHY 25

[7] C. Stauffer and W.E.L. Grimson. Adaptive background mixture models for real-

time tracking. In IEEE Computer Society Conference on Computer Vision and Pattern

Recognition, volume 2, page 252, June 1999.

[8] Z. Zivkovic. Improved adaptive gaussian mixture model for background sub-

traction. In Proceedings of the 17th International Conference on Pattern Recognition,

volume 2, pages 2831, 2004.

bachelor thesis - aneesh sharma

Documents