a bayesian algorithm for tracking multiple moving objects ...vis-narayana/bayesiantracker2007.pdfof...

8
A Bayesian algorithm for tracking multiple moving objects in outdoor surveillance video Manjunath Narayana Donna Haverkamp University of Kansas University of Kansas Lawrence, Kansas Lawrence, Kansas [email protected] [email protected] Abstract Reliable tracking of multiple moving objects in video is an interesting challenge, made difficult in real-world video by various sources of noise and uncertainty. We propose a Bayesian approach to find correspondences between moving objects over frames. By using color values and position information of the moving objects as observations, we probabilistically assign tracks to those objects. We allow for tracks to be lost and then recovered when they resurface. The probabilistic assignment method, along with the ability to recover lost tracks, adds robustness to the tracking system. We present results that show that the Bayesian method performs well in difficult tracking cases and compare the probabilistic results to a Euclidean distance based method. 1. Introduction The ability to track a particular object or objects in successive frames is an important step in object tracking and classification applications. In many tracking applications, whether in the visible or non-visible spectrum, multiple target objects are to be analyzed at each time step. Finding reliable correspondences between objects in one time step to objects in the next is a non- trivial task given the non-deterministic nature of the subjects, their motion, and the image capture process itself. It can be a difficult task in some cases, especially in the presence of noisy measurements from the sensors, occlusion from objects in the field of view, and changes in orientation and direction of movement of the objects. In the work discussed in this paper, we tackle the issue of real-time multiple object correspondence in outdoor stationary camera scenes. While our application is in the visible spectrum and uses the color information from objects, the method could be extended to the non-visible spectrum as well by using appropriate spectral information. We begin with the detection of moving pixels using subtraction from a moving average background model. The moving pixels are clustered into regions, which we refer to as blobs, so that pixels belonging to a single object are grouped together. Once moving regions have been identified, the next task is to generate tracks of these objects over successive frames. This is essentially a correspondence task, the goal of which is to find a match for each blob or object in the previous frame to one of the blobs or objects in the current frame. We use Bayesian inference based on the color and position information of the moving objects. When the information from a frame does not support the presence of an object, we allow for the corresponding track to be declared ‘lost’. If the object resurfaces in subsequent frames, the system reassigns the track to the object. Thus, the Bayesian method is able to handle occlusion, disappearance of objects, sudden changes in object velocity, and changes in object color profile and size. 2. Previous Work Since moving objects are typically the primary source of information in surveillance video, most methods focus on the detection of such objects. Common methods for motion segmentation include average background modeling [1], Gaussian background modeling [2], and optical flow methods [3]-[5]. Simple background subtraction is a widely used method for the detection of moving objects. However, the background image is not always known and often needs to be automatically generated by the system [1]. In our method, we use an average background model. Traditionally, Bayesian tracking methods involve generating a reference model of the object being tracked, then looking for the best match for the model in each frame by using predictive approaches like Kalman filters[6][7], and sampling-based approaches like Condensation[8] and the Joint Probabilistic Data Association Filter(JPDAF)[7][9]. Instead, we perform motion segmentation to detect objects for tracking. This leads to a fundamental difference in the way posterior probability is calculated for the JPDAF versus our method. In JPDAF, the posterior probability of associating each observation to each track is calculated using the Bayesian formula on the track model equations. In other words, JPDAF calculates the probability of blob-to-track 1

Upload: others

Post on 01-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Bayesian algorithm for tracking multiple moving objects ...vis-narayana/BayesianTracker2007.pdfof information in surveillance video, most methods focus on the detection of such objects

A Bayesian algorithm for tracking multiple moving objects in outdoor

surveillance video

Manjunath Narayana Donna Haverkamp University of Kansas University of Kansas

Lawrence, Kansas Lawrence, Kansas [email protected] [email protected]

Abstract

Reliable tracking of multiple moving objects in video is an interesting challenge, made difficult in real-world video by various sources of noise and uncertainty. We propose a Bayesian approach to find correspondences between moving objects over frames. By using color values and position information of the moving objects as observations, we probabilistically assign tracks to those objects. We allow for tracks to be lost and then recovered when they resurface. The probabilistic assignment method, along with the ability to recover lost tracks, adds robustness to the tracking system. We present results that show that the Bayesian method performs well in difficult tracking cases and compare the probabilistic results to a Euclidean distance based method.

1. Introduction The ability to track a particular object or objects in

successive frames is an important step in object tracking and classification applications. In many tracking applications, whether in the visible or non-visible spectrum, multiple target objects are to be analyzed at each time step. Finding reliable correspondences between objects in one time step to objects in the next is a non-trivial task given the non-deterministic nature of the subjects, their motion, and the image capture process itself. It can be a difficult task in some cases, especially in the presence of noisy measurements from the sensors, occlusion from objects in the field of view, and changes in orientation and direction of movement of the objects.

In the work discussed in this paper, we tackle the issue of real-time multiple object correspondence in outdoor stationary camera scenes. While our application is in the visible spectrum and uses the color information from objects, the method could be extended to the non-visible spectrum as well by using appropriate spectral information. We begin with the detection of moving pixels using subtraction from a moving average background model. The moving pixels are clustered into regions, which we refer to as blobs, so that pixels belonging to a

single object are grouped together. Once moving regions have been identified, the next

task is to generate tracks of these objects over successive frames. This is essentially a correspondence task, the goal of which is to find a match for each blob or object in the previous frame to one of the blobs or objects in the current frame. We use Bayesian inference based on the color and position information of the moving objects. When the information from a frame does not support the presence of an object, we allow for the corresponding track to be declared ‘lost’. If the object resurfaces in subsequent frames, the system reassigns the track to the object. Thus, the Bayesian method is able to handle occlusion, disappearance of objects, sudden changes in object velocity, and changes in object color profile and size.

2. Previous Work Since moving objects are typically the primary source

of information in surveillance video, most methods focus on the detection of such objects. Common methods for motion segmentation include average background modeling [1], Gaussian background modeling [2], and optical flow methods [3]-[5]. Simple background subtraction is a widely used method for the detection of moving objects. However, the background image is not always known and often needs to be automatically generated by the system [1]. In our method, we use an average background model.

Traditionally, Bayesian tracking methods involve generating a reference model of the object being tracked, then looking for the best match for the model in each frame by using predictive approaches like Kalman filters[6][7], and sampling-based approaches like Condensation[8] and the Joint Probabilistic Data Association Filter(JPDAF)[7][9]. Instead, we perform motion segmentation to detect objects for tracking. This leads to a fundamental difference in the way posterior probability is calculated for the JPDAF versus our method. In JPDAF, the posterior probability of associating each observation to each track is calculated using the Bayesian formula on the track model equations. In other words, JPDAF calculates the probability of blob-to-track

1

Page 2: A Bayesian algorithm for tracking multiple moving objects ...vis-narayana/BayesianTracker2007.pdfof information in surveillance video, most methods focus on the detection of such objects

association, given the track models. We approach the assignment differently by calculating the probability of blob-to-track association, given the previous frame’s blob information. Thus, we do not need to maintain track models for each track. We probabilistically assign tracks to blobs by using the color values and position information of the segmented blobs as observations in the Bayes’ formula.

3. Motion Segmentation We use a moving average model where the average

intensity value for each pixel over a window of N frames (typically 150) is regarded as the background model. Moving objects do not contribute much to the average intensity value [10]; thus, the model becomes a reasonable estimate of the background.

If is the intensity level at coordinates Y=y (column) and X=x (row) of the k

),( xyI k

th frame in the sequence

and is the average background model value at (y,x) for frame k, then

),( xybg k

⎟⎟⎠

⎞⎜⎜⎝

⎛= ∑

+

−=

)2/(

)2/(

),(/1),(Nk

Nkj

jk xyINxybg (1)

The segmented output is:

otherwiseTxyIxybgifxyseg kkk

,0),(),(,1),(

=

>−= (2)

where T is a predetermined threshold, typically 10. Following segmentation, region growing is used to

locate each moving object. An envelope is established around each object of interest by use of morphological erosion and dilation filters. Additional noise is filtered by ignoring blobs less than S pixels (typically 20) in size.

4. Object Tracking We perform object tracking by finding correspondences

between blobs in the previous frame and blobs in the current frame. In the case of a noiseless environment and perfect segmentation of blobs, the task is straightforward because a one-to-one correspondence can be usually be found for a blob in two successive frames. In reality, however, there is often a great deal of uncertainty in the final segmentation. Failure to detect small yet valid blobs due to their size, occlusion of valid blobs by other objects in the scene, change in object orientation, entry (or exit) of objects in the scene, and camera jitter are the main reasons for uncertainty. These factors make correspondence a difficult problem to solve, particularly for videos that are shot outdoors. Our goal is to develop a correspondence scheme that is robust to these sources of uncertainty.

4.1.1 General tracking approach For each motion blob in the first frame, a new track is created. In subsequent frames, matching blobs are sought for each track in the previous frame. Matched tracks are updated with position and color information from the matching blob. We allow tracks to be declared ‘lost’ if any of the factors mentioned above causes the algorithm to fail to detect a suitable match for a track. The track is not deleted but is ‘kept alive’ for a few frames in hopes that a match will be found. A track is deleted only if a suitable match is not found for L (typically 5) consecutive frames. If there is any blob in the new frame that cannot be matched to an existing track, then a new track is generated. A list of currently ‘active’ tracks is maintained for each time step/frame. 4.1.2 The correspondence algorithm Commonly, a distance-based match matrix is used to solve the correspondence problem, like in [10]. To facilitate track correspondence, we designed a Bayesian inference method to find matching blobs in consecutive frames. We also calculate the probability that a track is ‘lost’ in the current frame. The basic idea behind our approach is that given an object (blob) in the current frame, there is some probability that a blob of similar color and position will be observed in the next frame. Using Bayes’ formula, we can find the probability of a blob belonging to a track from the previous frame, given the blob’s color and position observations. The sub-problem can be formulated thus: If { }11

21

11

1,...,, −−−−−= k

ukkk

ktttT is the set of tracks in

frame (k-1) and { }kv

kkkkoooO ,..., 21= is the set of

moving blobs in frame k, what is the best possible match between the tracks and blobs? Here, is the j1−k

jt th track,

u k-1 is the total number of tracks in frame (k-1), is the i

kio

th blob found in frame k and vk is the total number of moving blobs in frame k. In other words, what is the probability of assignment of track to blob for all j and i? An increase in the

probability of assignment of track to a blob

should automatically reduce the probability of assignment of track to other blobs , where

1−kjt k

io1−k

jt kio

1−kjt k

mo im ≠ . If no

suitable match for is found, then the probability that

it is ‘lost’ should be made high.

1−kjt

By looking at the observations for all elements and , we update a belief matrix 11 −− ∈ kk

j Tt kki Oo ∈

2

Page 3: A Bayesian algorithm for tracking multiple moving objects ...vis-narayana/BayesianTracker2007.pdfof information in surveillance video, most methods focus on the detection of such objects

which contains the probability of match for each track to each candidate blob. The probabilistic network that is used to calculate this belief matrix is given in Fig. 1.

For ease of notation, we now refer to the event “track

associated with blob ” as ‘Assign’. The event that

track is not associated with current candidate blob

is called ‘NotAssign’. ‘NotAssign’ implies that track

has been associated with another blob (where

1−kjt

kio

1−kjt

kio

1−kjt k

moim ≠ ) or has been declared ‘lost’.

4.1.3 Method and updating formulae In the Bayes network, R, G, and B are the color observations. R is the difference between mean Red values, G is the difference between mean Green values, and B is the difference between mean Blue values of the blob and

track .

kio

1−kjt

From (3) and (4), we can say

)(

)()()(

AssignbBp

ignAssgGpignAssrRpAssigncp

=×== (5) Together R, G, and B form the color observation c: { }bBgGrRc ==== ,,

)()()( ignAssxXpignAssyYpAssigndp =×== (6) Y and X are the differences between position of blob

and predicted position of track in y and x

coordinates, respectively. A simple estimate based on current velocity and current position of track is used as the predicted position for the track.

kio

1−kjt

Also, we set 1.0)( =NotAssigncp (7)

1.0)( =NotAssigndp (8)

These are probabilities that a given observation is seen in blob without a corresponding track existing in

the previous frame. It may be noted that these numbers are small, which reflects that the probability of finding a blob of a given color and position without a corresponding track in the previous frame is low.

kio 1−k

jtTogether, Y and X form the position observation d: { }xXyYd === ,Depending on the state of the color and position observation of blob , the posterior probability of a

track being assigned to blob is calculated.

kio

1−kjt k

io The prior probability for the assignment is: Though R, G and B values are not independent in nature,

for simplicity, we assume independence among R, G, and B observations. The assumption of independence does not affect the results adversely. Thus,

(9) )1/(1)()( 1 +==→− kki

kj vAssignpotp

where v k is the number of moving blobs in frame k. Note: The factor implies that the track has

an equal prior probability of being associated with any of the blobs of frame k or of being declared as ‘lost’.

)1( +kv 1−kjt

kv

)()()()( bBpgGprRpcp =×=×== (3) Similarly, the Y and X independence assumption leads to (4) )()()( xXpyYpdp =×==

Fig. 1. Probabilistic network for track assignment

3

Page 4: A Bayesian algorithm for tracking multiple moving objects ...vis-narayana/BayesianTracker2007.pdfof information in surveillance video, most methods focus on the detection of such objects

p(c) is given by:

)()(

)()()(

NotAssignpNotAssigncp

AssignpAssigncpcp

×+

×= (10)

Similarly, p(d) is given by:

)()(

)()()(

NotAssignpNotAssigndp

AssignpAssigndpdp

×+

×= (11)

where )(1)( AssignpNotAssignp −=The posterior probability of a track being assigned to a blob given the blob’s color observation is, by Bayes’ formula,

)(

)()()|(

cp

AssignpAssigncpcAssignp ×= (12)

Similarly, the posterior probability of a track being assigned to a blob given the blob’s position observation is

)(

)()()|(

dp

AssignpAssigndpdAssignp ×= (13)

4.1.4 Belief Matrix and update The Belief Matrix is a rows by columns matrix. The Belief Matrix contains the probabilities of each track being assigned to each blob and also of being declared as lost, hence the columns. Denoted

by , Belief is the probability of assignment of track j

to blob i; i.e., .

1−ku )1( +kv

)1( +kv

jiBP)( ijAssignp →

For each frame k, the following procedure is followed to update the belief matrix: 1.Initialize the Belief Matrix to )1/(1 +kv

(14) },,3,2,1{

},,3,2,1{),1/(1)0( 1

k

kkji

vi

ujvBP

K

K

∈+= −

Where uk-1 is the number of tracks in frame k-1, vk is the number of moving blobs in frame k and is

the initial value of Belief Matrix at row j and column i.

)0(jiBP

2. Iterate through all tracks and blobs and: For each track 11 −− ∈ kk

j TtFor each blob kk

i Oo ∈a. Calculate based on the color

observation of

)(nBPji

kio

where stands for the value of the

Belief Matrix at row j and column i due to observation from the n

)(nBPji

th blob (which is ). Note that the value of n is always equal to the value of i.

kio

b. Normalize or update other beliefs imnBPjm ≠),(

c. Recalculate based on position

observation of

)(nBPji

kio

d. Normalize or update other beliefs imnBPjm ≠),(

3. Match tracks and blobs based on the updated Belief Matrix

The Belief calculation based on the color observation is:

)(

)1()()(

cp

nBPAssigncpnBP jiji −×= (15)

where is the Belief at the end of the

observation from the n

)(nBPji jiBPth blob (which is ). As noted

before, the value of n is always equal to value of i. Equation 15 is obtained by replacing and

with and

kio

)|( cAssignp)(Assignp )(nBPji )1( −nBPji ,

respectively, in the posterior equation (12). This follows from the fact that is the posterior probability

after the color observation from , and

)(nBPji

kio )1( −nBPji is

the prior before the color observation from . kio

Similarly, the Belief calculation based on the position observation is:

)(

)1()()(

dp

nBPAssigndpnBP jiji −×= (16)

After each blob observation, it is necessary to

normalize the Belief Matrix row so that

kio

1)(1

1=∑

+

=

kv

mjm nBP

We propose the following formula for update For all im ≠ ,

⎥⎦

⎤⎢⎣

→→Δ

+×=old

old jmnew jm )()( 1 Belief Belief

ijAssignNotpijAssignNotp

(17) This formula ensures that all Beliefs are altered proportionally while maintaining the probability

requirement . 1)(1

1=∑

+

=

kv

iji nBP

4

Page 5: A Bayesian algorithm for tracking multiple moving objects ...vis-narayana/BayesianTracker2007.pdfof information in surveillance video, most methods focus on the detection of such objects

Using the following formal definitions,

)1(1)()(1)(

)()(

)(BP Belief

)1(BP Belief

jmnew jm

oldjmold jm

−−=→

−=→

=→

==

−==

nBPijNotAssignpnBPijNotAssignp

nBPijAssignp

nBP

nBP

jiold

ji

ji

jmnew

jm

ji

jiji

BP

nBPnBPijAssignNotpΔ−=

−−=→Δ )()1()(

equation (17) becomes

⎥⎥⎦

⎢⎢⎣

−Δ

−=)1(

1oldji

jioldjmnewjm BP

BPBPBP (18)

Given the belief matrix, a sample of which is illustrated in Fig. 2a, the best possible assignment is determined. The matrix rows correspond to active tracks from the previous frame and columns correspond to detected blobs in current frame. Fig. 2b shows the resulting track assignments for a single frame. Blob 1 is assigned to track 12, blob 2 to track 7. Tracks 3 and 11, which are active from previous frames, are declared ‘lost’ in frame 0240. 4.1.5 PDF’s used for R, G, and B; X and Y By manually observing difference in color and position in object tracks over a few hundred consecutive frames in the data, appropriate Probability Distribution Functions were determined for color and position probability. The PDF’s are shown in Fig. 3.

5. Results Samples from the tracking results are shown in figures 4

and 5. Fig. 4 shows tracks 03 and 07 for two cars through an occluded region. Car 03 is completely occluded in frame 0210. Also, in frames 0210 and 0225, Cars 07 and 03 get incorrectly segmented by the background segmentation method into two objects each. The Bayesian method is capable of recovering from such errors as can be seen from results in frame 0285 where both cars are correctly identified. Successful tracking of cars in a crowded region is shown in Fig. 5. Objects 41, 42, 43 and 45 are correctly identified in successive frames despite their close spatial proximity. It may be noted that there is also a sudden change in velocity of object 41 between frames 1650 and 1665. The algorithm is robust to this change because it uses a combination of color and position for track correspondence. Erroneous segmentation of car 41 in frame 1650 (as two objects, 41 and 46) and of car 45 (as two objects, 45 and 47) is corrected by frame 1675.

Fig. 2. (a) Belief matrix (b) Tracks resulting from Belief

matrix for frame 240 of video sequence

We compare the results of the Bayesian method to a basic Euclidean distance based method similar to approach used in [10]. The Bayesian belief matrix and the Euclidean distance matrix for frame 0275 are presented for analysis in Fig. 6. Each value in the Euclidean distance matrix is the sum of the Euclidean distances in color (R, G, and B) and position (y and x coordinates) values between a blob of the current frame and a track from the previous frame. The smaller the distance value, the better the match between the blob and the track. Frame 0275 (Fig 6b) has two detected objects and three active tracks from previous frames. Frame 0265 (Fig 6a) shows the earlier track placements. Car 12, which is detected and tracked in frame 0265 is not detected in frame 0275 due to shortcomings of the segmentation algorithm. Also, a new car enters the scene in frame 0275. In the belief matrix, blobs 1 and 2 are the blobs on the left side of frame 0275 and blob 3 is the blob on the right side of frame 0275. From the Bayesian belief matrix, we can infer that blobs 1 and 2 are best assigned to tracks 07 and 03 respectively. Track 12 is declared ‘lost’ and a new track (14) is generated for blob 3. The Euclidean distance matrix would assign blobs 1 and 2 correctly, but would incorrectly assign blob 3 to track 12. From the frames shown, we can see that this would be an error. For every track, the

5

Page 6: A Bayesian algorithm for tracking multiple moving objects ...vis-narayana/BayesianTracker2007.pdfof information in surveillance video, most methods focus on the detection of such objects

Bayesian method gives the probability of the track being “lost”. It is not possible to calculate the chances of a track being “lost” from the Euclidean distance matrix.

6. Conclusions and future work While Bayesian reasoning and networks have been used

by many researchers in the area of video surveillance, this paper describes a direct way of using Bayesian inference for solving the correspondence problem in tracking. It is direct in the sense that the color and position of the new blobs are the observations, while the assignment of blobs to known tracks is the hypothesis.

With the provision for allowing tracks to be declared as lost, we add robustness to the system and are able to pick up tracks of objects which may have disappeared momentarily due to occlusion or other reasons.

Though we currently work in RGB space, the method may be easily expanded to other observation spaces such as HSV. It could also be adapted through experimentation to IR imagery using intensity rather than color and adding object size as an observation for finding correspondence. In hyperspectral data, Principal Components Analysis could be done to choose the best bands, and the PDF’s of these bands would be used in place of RGB PDF’s for the Bayesian inference. The Bayesian correspondence method is a generic approach and not specific to color video data. We plan to adapt it to other imaging modalities in the future.

In ours work thus far, the PDF’s used for color and position observations were arrived at by manual observation of the sequence. In the future, these PDF’s may be learnt automatically over the course of the video.

Also, in our surveillance video, there is significant change in object size and depth from one part of the scene to another. We are currently working on automatically learning these variations based on track motion values.

References [1] Guohui Li, Jun Zhang, Hongwen Lin , D. Tu, and

Maojun Zhang, “ A moving object detection approach using integrated background template for smart video sensor,” in Proceedings of the 21st IEEE Instrumentation and Measurement Technology Conference, IMTC 04, May 2004, vol. 1, pp. 462- 466.

[2] C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using real-time tracking,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 747- 757, Aug. 2000.

[3] J. L. Barron, D. J. Fleet, and S. Beauchemin, “Performance of optical flow techniques,” International Journal of Computer Vision, vol. 12, no. 1, pp. 43–77, 1994.

[4] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, 17, pp. 185-203, 1981.

[5] Wang, J. Y. A. and Adelson, E. H., “Spatio-temporal segmentation of videodata” , in Proceedings of SPIE on

Image and Video Processing II, 2182, pp. 120- 131, San Jose, Feb. 1994.

Fig. 3. (a) PDF for color observation, R (The same

distribution applies for color observations G, B) (b) PDF for position observation, Y (The same distribution applies for position observation X)

[6] R.E. Kalman, “A new approach to linear filtering and

prediction problems,” Trans. American Society of Mechanical Engineers, Journal of Basic Engineering, vol. 82, pp. 35– 45, 1960.

[7] Y. Bar-Shalom and T. E. Fortmann, Tracking and Data Association. Academic Press, 1988.

[8] M. Isard and A. Blake, “Condensation - conditional density propagation for visual tracking,” International Journal of Computer Vision, vol. 29, no. 1, pp. 5–28, 1998.

[9] C. Rasmussen and G. D. Hager, “Probabilistic Data Association Methods for Tracking Complex Visual Objects,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 560-576, Jun. 2001.

[10] S. Intille, J. Davis, and A. Bobick, “Real-time closed-world tracking,” IEEE CVPR, pp. 697-703, 1997.

6

Page 7: A Bayesian algorithm for tracking multiple moving objects ...vis-narayana/BayesianTracker2007.pdfof information in surveillance video, most methods focus on the detection of such objects

Fig. 4. Cars through an occluded region Fig. 5. Crowded cars section of video

7

Page 8: A Bayesian algorithm for tracking multiple moving objects ...vis-narayana/BayesianTracker2007.pdfof information in surveillance video, most methods focus on the detection of such objects

Fig. 6. Comparison of Bayesian belief matrix and Euclidean distance based matrix for frame 0275 Track numbers assigned for (a) Earlier frame 0265 (b) Current frame 0275 (c) Bayesian belief matrix for frame 0275 (d) Euclidean

distance matrix for frame 0275

8