video surveillance applications using multiple views of a scene

6
Video Surveillance Applications ABSTRACT using Multiple Views of a Scene Michael Meyer, Thorsten Ohmacht and Robert Bosch GmbH and Michael Hotter Fachhochschule Hannover Automatic video surveillance techniques are used to detect intruders within a scene. This task is mostly reduced to the problem of detecting moving objects evaluating image sequences of a monocular camera. An essential problem of this monocular approach is its inability to measure the 3D-size and 3D-position of objects reliably, as object size and velocity are estimated within the 2D-image plane. To include 3D-information about the scene an approach using a second camera is proposed in this paper which combines the evaluation of the measurement data of the two cameras using an efficient 3D-scene model. Here, two cameras are used with an overlapping field of view, which represents an installation often used in existing video surveillance applications. the two cameras, the false detection rate in the case of moving shadows, leaves, birds and insects or blindings can be further reduced compared to a pure monocular evaluation. It is shown that using the combined evaluation of ~~~~ ~ ~ ~ Authors’ Current Addresses: M.Meyer, T. Ohmacht and R. Bosch, GmbH Research Institute for Communications, (FVISLH). Postfach 77 77 77, 31132 Hildesheim, Germany; M. Hotter, Department of Electrical Engineering, Fachhochschule, Hannover, Ricklinger Stadtweg 120, D-30459 Hannover, Germany. Based on a presentation at the 1998 Carnahan Conference. 088.5/8985/991 $10.00 8 1999 IEEE INTRODUCTION The application of digital video has an increasing importance in the video surveillance of buildings and grounds. The automatic detection, localization and tracking of moving objects is the essential subject in these surveillance applications. The object detection can relieve human observation personnel of a tiresome and tedious occupation. From a technical point of view, the main task of a video-based detection scheme consists of the detection and tracking of objects which do not belong to the observed scene and which are to be discriminated on the basis of their shape, motion and texture. Here, restrictions concerning the setup of the scene, i.e., properties concerning shape and texture should be kept to a minimum to be adaptive for outdoor applications where noticeable, but irrelevant image activity may occur. These image activities can be caused by changing weather conditions and must not cause (false) alarms. Hence, a video system is desired which requires a minimum of specific knowledge about the scene and guarantees a minimum of false alarms while simultaneously keeping the sensitivity of the system as high as possible. The standard methods for detecting moving objects in a video signal obtained by a static camera are based on the evaluation of temporal differences of subsequent images. These methods often are based on the assumption that moving objects will, in general, cause temporal gray IEEE AES Systems Magazine, March 1999 13

Upload: m

Post on 28-Feb-2017

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Video surveillance applications using multiple views of a scene

Video Surveillance Applications

ABSTRACT

using

Multiple Views of a Scene

Michael Meyer, Thorsten Ohmacht and Robert Bosch GmbH

and Michael Hotter

Fachhochschule Hannover

Automatic video surveillance techniques are used to detect intruders within a scene. This task is mostly reduced to the problem of detecting moving objects evaluating image sequences of a monocular camera. An essential problem of this monocular approach is its inability to measure the 3D-size and 3D-position of objects reliably, as object size and velocity are estimated within the 2D-image plane. To include 3D-information about the scene an approach using a second camera is proposed in this paper which combines the evaluation of the measurement data of the two cameras using an efficient 3D-scene model. Here, two cameras are used with an overlapping field of view, which represents an installation often used in existing video surveillance applications.

the two cameras, the false detection rate in the case of moving shadows, leaves, birds and insects or blindings can be further reduced compared to a pure monocular evaluation.

It is shown that using the combined evaluation of

~~~~ ~ ~ ~

Authors’ Current Addresses: M.Meyer, T. Ohmacht and R. Bosch, GmbH Research Institute for Communications, (FVISLH). Postfach 77 77 77, 31 132 Hildesheim, Germany; M. Hotter, Department of Electrical Engineering, Fachhochschule, Hannover, Ricklinger Stadtweg 120, D-30459 Hannover, Germany.

Based on a presentation at the 1998 Carnahan Conference.

088.5/8985/991 $10.00 8 1999 IEEE

INTRODUCTION

The application of digital video has an increasing importance in the video surveillance of buildings and grounds. The automatic detection, localization and tracking of moving objects is the essential subject in these surveillance applications. The object detection can relieve human observation personnel of a tiresome and tedious occupation.

From a technical point of view, the main task of a video-based detection scheme consists of the detection and tracking of objects which do not belong to the observed scene and which are to be discriminated on the basis of their shape, motion and texture. Here, restrictions concerning the setup of the scene, i.e., properties concerning shape and texture should be kept to a minimum to be adaptive for outdoor applications where noticeable, but irrelevant image activity may occur. These image activities can be caused by changing weather conditions and must not cause (false) alarms. Hence, a video system is desired which requires a minimum of specific knowledge about the scene and guarantees a minimum of false alarms while simultaneously keeping the sensitivity of the system as high as possible.

The standard methods for detecting moving objects in a video signal obtained by a static camera are based on the evaluation of temporal differences of subsequent images. These methods often are based on the assumption that moving objects will, in general, cause temporal gray

IEEE AES Systems Magazine, March 1999 13

Page 2: Video surveillance applications using multiple views of a scene

image sequence I

reference data

object segmentation

object feature measurement

object feature L evaluation

Fig. 1. Block Diagram of the Object Detection Concept for the Evaluation of a Single Camera

level changes. The achievable performance of these approaches is limited as temporal signal changes may have numerous other causes such as irrelevant types of motion (snow, rain, trees in motion), global and local illumination changes and small camera motion (vibrations).

been introduced which aim at the robustness and reliability of the system:

To handle these problems, new techniques have

Texture and motion are additionally evaluated to suppress false alarms [2].

Image features are described and estimated in an object-oriented instead of a block-based fashion to increase the reliabilty of relevant object classification [3]. Typical scene situations are automatically learned by a recursive estimation of the temporal distribution of image features as texture and motion. In this way the calibration effort remains extremely small [2].

Object tracking is performed to judge object motion and its relevance [4].

The main problem which remains considering these extensions and improvements, is the inability to measure the 3D-size and 3D-position of objects reliably if a monocular approach is applied. This is due to the fact that object size and velocity are estimated within the image

plane and that a re-calculation of 3D-parameters of the scene based on the estimated 2D-parameters is ambigious in general. To include 3D-information about the scene an approach using multiple views is presented in this paper. This approach combines the evaluation of the

j camera n camera n + l camera n+2

i / i i

section m section m + l section m+2 I

P I /

I Q I

I I I

I section m section m + l I section m+2 I

P i Fig. 2. Typical Installation of a Video Surveillance Application

measurement data of two cameras with an overlapping field of view based on a simple 3D-scene model. This video camera configuration is often used in existing surveillance systems (CCTV).

the evaluation of one camera is briefly sketched, then the new system evaluating two cameras is described in detail. It is shown that using the combined evaluation of the two cameras, the false detection rate in case of leaves, birds and insects or blindings, can be further reduced compared to a pure monocular evaluation.

In the following, the object detection concept for

DESCRIPTION OF THE OBJECT DETECTION CONCEPT FOR THE EVALUATION OF ONE CAMERA

The block diagram of an object oriented detection scheme evaluating a single camera is shown in Figure 1. The algorithm includes a combination of a block-based and an object-oriented evaluation of image features. First, a block oriented change detection and texture analysis indicate those image blocks which contain possible objects and hence, which should be further investigated. This is done in an object-oriented fashion: a change detection mask is calculated now with pel resolution within the indicated blocks using an algorithm described in [l]. Each coherent changed image region of the change detection mask then gets its own label indicating separate object candidates to be analysed (object segmentation). For each of these objects, their features signal change, texture and motion are calculated and compared to reference data based on a statistical approach [3]. This comparison yields a decision whether an object alarm should be given or not (object feature evaluation). Furthermore, this decision influences where and how reference data should be updated. For this approach, no a priori knowledge about the 3D geometry is included: Evaluation is purely restricted to 2D-image data. In order to consider 3D-size

14 IEEEAES Systems Magazine, March 1999

Page 3: Video surveillance applications using multiple views of a scene

image sequence camera 1

object i segmentation

3D-scene model

reference 1 d; 1 t , object feature

image sequence camera 2

object segmentation

I 1

object alarm 1 Fig.3. Block Diagram of the Obbject Detection Concept Combining the Evaluation of Two Cameras

Fig. 4. Example of a Moving Object in Sensor View

and -distance of objects, further views of the scene are helpful here realized by a second camera with an

overlapping field of interest. This extension of object detection will be described in the following section.

IEEEAES Systems Magazine, March 1999 15

Page 4: Video surveillance applications using multiple views of a scene

Fig. 5. Co-sensor View in Case of the Moving Object: The Object Does Not Appear at the Predicted Position

OBJECT DETECTION CONCEPT COMBINING THE EVALUATION OF MULTIPLE CAMERAS

A typical installation which is often used in video surveillance applications is shown in Figure 2, on page 14. Cameras are installed in a row. The area of interest is separated into different sections which are observed in an overlapping fashion. Here each section is seen by two cameras. For example, section m+2 is projected into the images of camera n and camera n + 1. Assuming that relevant objects are moving on the surface of the scene to be observed they are visible in the images of at least two cameras. For an object observed by camera n + 1, the object position in camera n is predictible if the 3D-geometry of the scene is known. In case of a planar scene, Tsai and Huang have shown in [6] that this prediction of image coordinates of a camera n + l (Xn+l;

Yn+l) into image coordinates of camera n (Xn ; Yn) (and vice versa) can be achieved using 8 so-called mapping parameters a 1 . . . 8 according to Equations (1) and (2).

The parameters describe the camera positions as well as the location of the plane.

(2) a4Xn+1 $- asJ’n+l f a6

a7Xn+l + a8Yn+1 $- Yn =

In our experiments, 8 parameters are calculated from 4 given point correspondences using a linear regression.

The detection concept combining the evaluation of two cameras is shown in Figure 3. For each camera object segmentation and object feature measurement is performed as described in the section describing the one-camera concept.

16 IEEE AES Systems Magazine, March I999

Page 5: Video surveillance applications using multiple views of a scene

Fig. 6. Example of a Typical Distortion in Sensor View

For a moving object in camera 1, an object postion in camera 2 is predicted. If the object does not appear at the predicted position in camera 2, it is classified as irrelevant and no object alarm is given.

EXPERIMENTAL RESULTS

The detection concept based on the evaluation of two cameras has been tested in an installation as shown in Figure 2. Figure 4, on page 15, shows a moving object in section m + 2 in the view of camera n + 1. In the view of camera n, this object is visible too (Figure 5, on previous page) and object alarm is given.

Figure 6 shows a typical distortion which is not in the view of the second camera (Figure 7, on next page) and hence no alarm is given.

Using the combined evaluation of two cameras, (false) alarms in case of birds, insects and flying foliage are suppressed while moving humans are still detected reliably.

CONCLUSION

A new approach for the automatic video surveillance based on the combined evaluation of two cameras has been proposed and some experimental results have been presented. The essential advantages of this new approach are:

The robustness of object detection is considerably improved compared to a pure monocular approach because irrelavant objects (i.e., birds, insects) do not cause (false) alarms. The additional computational effort is very small because the combination of evaluation is done on

IEEE AES Systems Magazine, March 1999 17

Page 6: Video surveillance applications using multiple views of a scene

Fig. 7. Co-sensor View in Case of the Distortion: the Object Does Not Appear at the Predicted Position

the basis of object features and not on the basis of image contents. No additional installation effort is needed in case of typical surveillance installations.

ACKNOWLEDGEMENTS

The authors appreciate the support of J.D. Biichs, P. Ribinski, A. Hensel, U. Oppelt and F. Rottmann in encouraging this work.

modelling and analysis of complex scenes, Special Issue of Image Communication on Image and Video Semantics: processing, analysis and application.

[3] M. Hotter, R. Mester and M. Meyer, 18-20 October 1995, Detection of Moving Objects in Natural Scenes by a Stochastic Multi-Features Analysis of Video Sequences,

1995 International Carnahan Conference on Security Technology, Sanderstead, Surrey, UK, pp. 47-52.

[4] M. Meyer, M. Hotter and T. Ohmacht, 2-4 October 1996, A New System for Video-Based Detection of Moving Objects and its Integration into Digital Networks,

1996 International Carnahan Conference on Security Technology, Lexington, Kentucky, USA, pp. 105-110.

[5] R. Mester and M.Hotter, 3-6 July 1995, Robust Displacement Vector Estimation including a Statistical Error Analysis,

REFERENCES

[ l ] T. Aach, A. Kaup and R. Mester, 27-29 October 1993, 5th International IEE Conference on Image Processing and its Applications, Edingburgh, UK, pp. 168-172. Change Detection in Image Sequences Using Gibbs Random

Fields: A Bayesian Approach, [6] R.Y. Tsai and T.S. Huang, 1981, Proceedings International Worshop on Intelligent Signal Proc.

and Communication Systems, Sendai, Japan, pp 56-61. Estimating Three-Dimensional Motion Parameters of a Rigid Planar Patch.

[2] M.Hotter, R. Mester and F. Muller, 1995, Detection and description of moving objects by stochastic

IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASP-29, pp. 1147-1152.

18 IEEEAES Systems Magazine, March I999