a televiewing system for multiple simultaneous customized...

B. Hall, K. Huang, and M. Trivedi, "A Televiewing System for Multiple Simultaneous Customized Perspectives and Resolutions," Submitted to IEEE Int’l Conf. on Intelligent Transportation Systems, Aug 2001.

Abstract— Recent innovations in real-time machine vision, distributed computing, software architectures, and high-speed communication are expanding the available technology for intelligent system development. These technologies allow the realization of intelligent systems that provide the capabilities for a user to experience events from remote locations in an interactive way. In this paper we describe research aimed at the realization of a powerful televiewing system applied to the traffic incident detection and monitoring needs of today’s highways. Sensor clusters utilizing both rectilinear and omni-directional cameras will provide an interactive, real-time, multi-resolution televiewing interface to emergency response crews. Ultimately, this system will have a direct impact on reducing incident related highway congestion by improving the quality of information to which emergency personnel have access. Index Terms—Traffic Monitoring, Omnidirectional vision, Human-computer interface, Computer vision, Televiewing

I. INTRODUCTION AND MOTIVATION The need for and value of developing intelligent systems capable of performing beneficial tasks at remote sites has been well recognized. Recent innovations in real-time machine vision, image capture hardware, distributed computing, software architectures, and high-speed communication are expanding the available technology base for developing such intelligent systems. These powerful new technologies allow our team to develop intelligent systems that provide powerful analysis tools and televiewing capabilities for users to experience and monitor events from remote locations interactively. However there are still several unsolved research challenges that need to be overcome before such intelligent systems find widespread use. These important research challenges can best be resolved by considering a specific application domain, for 1 CVRR lab website: http://swiftlet.ucsd.edu email contact: {tbhall | khuang | mtrivedi }@ucsd.edu

this paper we will focus on televiewing in a traffic setting. This allows incorporation of real-world constraints and requirements in the design and development phases. Televiewing is a powerful tool for transmitting a visualization of a remote environment, a process that is important for many applications. An urban traffic environment is a good example where televiewing is particularly powerful and useful. As a solution to the need for highway incident detection, using helicopters or planes to monitor continuously through the day is impractical and expensive. The existing inductive loop sensors are particularly sensitive to proper installation, and may be damaged when pavement repair operations are performed. In addition many of the loop sensors that have been installed are no longer functioning correctly. The research we present here offers an innovative solution to these challenges through televiewing [13]. The main goal of this research is to realize a powerful and integrated traffic incident detection and monitoring system. Automated software under development by our team will provide an automatic analysis of the current traffic conditions using clusters of cameras placed along the roadway. This analysis data will provide the emergency response crews with enhanced information in order to detect and analyze a potential incident. Once a potential incident is identified, the individual crews can observe the event using the remote camera clusters to view from the perspective and resolution that best suits the situation. The more interactive and vivid visualization environment will help improve the quality of information delivered to emergency response crews in order to decrease the response time and increase the preparedness of the arriving personnel. This system has the potential to make travel safer, smoother, and more economical, reducing wasted fuel and pollution. [2]

Computer Vision and Robotics Research Laboratory (CVRR) University of California, San Diego (UCSD)

La Jolla, CA 92093-04071

A Televiewing System for Multiple Simultaneous Customized Perspectives and

Resolutions Brett Hall, Kohsia Huang, Mohan Trivedi

Figure 1: System Architecture for real-time multiple user televiewing using ODVS and PTZ cameras

I. TELEVIEWING SYSTEM OVERVIEW In creating this televiewing environment, some strategy is needed to obtain a good video coverage of the area. There are several strategies appropriate to this application:

• Use multiple fixed perspective cameras • Use fewer cameras utilizing Pan/Tilt/Zoom (PTZ)

control • Omni-directional video sensors (ODVS) which

have a low-resolution 360° view • Combinations of the above, using the most

appropriate sensor for the required task This paper will focus on and discuss the use of the ODVS and PTZ cameras. These cameras mounted in a cluster configuration can create a powerful fusion that enhances the available data and hence achieves robustness beyond what one type of camera can do. In creating a visualization that provides perspective on demand, the design of the architecture and implementation is critical to ensure practicality and scalability in real applications. The system we have developed meets the needs of both, as illustrated in Figure 1.

This architecture helps us fully utilize a powerful benefit of the ODVS in that it captures the full 360° in one image [5]. This allows multiple clients to connect to the same remote site and receive the interactive, custom perspective and interaction as if a “virtual” PTZ camera were under each client’s full control. For perspective control, the advantage of the ODVS is that it does not need a mechanical moving mount to cover a wider area. A camera with moving mount violates single viewpoint criterion [7] and consumes time while rotating, thus could miss important events when used for general area surveillance. When more detail is needed, the limited resolution of the ODVS sensors can be overcome by utilizing a system to provide resolution on demand. Similar perspective, high-resolution views of the detail in the scene are available by using PTZ rectilinear cameras also located in the video cluster. For example, the ODVS view is well suited for seeing general vehicle motions or interactively viewing the scene. But a nearby moveable rectilinear camera can provide a high-resolution view of a specific vehicle or incident scene if needed. In order to scale this architecture to a larger and broader system, the only elements that need to be seriously considered for extension of the ODVS system are the network and the server hardware capable of handling more simultaneous users without reduced frame rates. The design

(−b, a)

R

Z

(0, c)

(RM, ZM)

(RO, ZO)

(−rI, −c−f)

f

Object

Image

Hyperboloidal Mirror

(0, −c) Focus

Focus

Hyperboloid Equation: and 222 bac +=

12

2

2

2=−

b

R

a

Z

CCD

of the camera system and client software can support “millions” of users. However the PTZ cameras are slightly more complex to scale to a larger system, only one user at a time will be able to control the position of the PTZ cameras. Use of the ODVS images should be sufficient for general use of the system almost all the time.

II. TELEVIEWING ALGORITHMS These are three specific types of algorithms that are at the core of the televiewing system. The first one deals with the optical imaging of a hyperboloidal omnidirectional vision sensor image. The second is associated with the digital PTZ functionality, and the third is associated with the merging of ODVS and high-resolution rectilinear camera images. These algorithms are described in detail below.

A. ODVS Televiewing An ODVS is a regular camera with an optical mirror

attached in front of the lens so that it can have at least a hemisphere view of the surroundings. A true omni-directional view is a spherical view from a single viewing point [7][9]. Depending on the manner of projection, the spherical view can be projected onto a cylindrical belt to be a panoramic view, or onto a plane perpendicular to the line of sight to be a rectilinear perspective view. ODVS perspective view is very feasible for televiewing. ODVS is also good for visual modeling, dynamic view synthesis, and real-time tracker [10][11].

1) ODVS Optical Modeling In our setup, hyperboloidal mirror is applied to the

ODVS. Hyperboloidal mirrors are found to be feasible for larger field of view [8]. It also satisfies single viewpoint property for generating perspective views [9]. Hyperboloid has two foci. As shown in Figure 2, light from object going through the first focus will be reflected to the second focus. If the center of camera lens is located at the second focus, then the image will be formed on the CCD plane. Thus the camera viewing is transformed to be a hemisphere viewing downward from the upper focus.

Figure 2: Optical modeling of a hyperboloid ODVS.

From Figure 2, given a, b, and c of the hyperboloid mirror and the focal length f of camera, the equations

relating an object point in 3D space and its image point on CCD plane as

cZRfr

M

MI +

= (1)

where

cmRZ MM += (2)

−

++=2

2

2

21

mba

macmRM (3)

and m is the slope of the line from object to focus (0, c),

O

O

RcZm −= (4)

Note that bam < because the asymptotic lines are

( ) RbaZ ±= . With these equations, a panoramic view can be generated from the ODVS image. A panoramic image plane is a cylindrical belt around the viewing point or the upper focus as in Figure 2. On each pan angle, we can project the associated column of the panoramic cylinder onto the CCD plane by equations (1) to (4). Thus the panoramic view can be generated from the CCD pixels.

2) ODVS Perspective Selection The advantage of ODVS is that it takes an

omnidirectional view in one shot as shown in Figure 7. It is obvious that ODVS is suitable for omnidirectional televiewing. The interested perspective view can be generated electronically with adjustable pan-tilt-zoom in real time.

Figure 3: Perspective view geometry

The rectilinear perspective view is a planar view from the single viewing point [7][9], which in our case is the upper focus. A perspective view can be produced by filling the viewing plane with the corresponding pixels in the ODVS image. As shown in Figure 3, the normal vector n of the perspective plane is rotated angle θ horizontally from the x-axis and angle φ vertically above the xy-plane. u and v vectors are the horizontal and vertical unit vectors of the perspective plane. To find the correspondence between x-y-z and u-v, we define the rotated coordination system x’-y’-z’ as shown, where x’ parallels with n, z’

v

u

x’ n

y’

z’

x

y

z

θ

φ ViewingPoint

FL(xp,yp,zp)

P(up,vp)

parallels with v, and y’ is on the negative direction of u. Thus the transformation from u-v to x-y-z can be derived as

−=++=

p

ppp

pvu

FLFLvu

zyx

Rnvu (5)

where FL is the perspective focal length, up and vp are the coordinate of point P in the perspective view. R is the rotation matrix from x’-y’-z’ to x-y-z,

−−−

=φφ

φθθφθφθθφθ

cos0sinsinsincoscossinsincossincoscos

R (6)

Then the (x, y, z) coordinate can be projected onto the ODVS CCD plane by equations (1) to (4).

B. High-resolution fusion For televiewing, the fusion of the different camera types creates an interactive viewing environment where the user can change the resolution and the perspective to maximize utility of the application. Our team has written software that smoothly fuses a high-resolution image directly into the lower-resolution ODVS view while the user is remotely viewing the environment. This creates the ability to view the detail in the area of interest while maintaining an awareness of the environment nearby thus enhancing the televiewing experience.

Figure 4: Registration and warping process of a high-

resolution image onto the ODVS image

Several parameters need to be set in order to do this including: registration of the image onto the scene, the warping of the high-resolution image to match the perspective of the ODVS view, setting the area in which the high-resolution image should be viewable (in pan and in tilt), and some speed and edge smoothing parameters that finish off the integration to create a smooth transition between the ODVS view and high-resolution image. The algorithm then uses OpenGL techniques to display and warp the high-resolution view according to the position & zoom inputs by the user. The image is resolution-blended

with the ODVS view around the edges gradually, so that the edges of the high-resolution image aren’t apparent. Additionally, the perspective of the object in view of the high-resolution camera needs to change as the user adjusts the controls. A simple (and fast) algorithm simulates the change in perspective by stretching the image using four points on the corners, as indicated in Figure 4 [4]. These warping parameters are specified at the extremes when the image is still in view (center, far-left, far-right, top, and the bottom) then linearly interpolated for positions in-between. The resulting visualization is seamless; when a high-resolution area is viewed the user will see a clear, high-resolution area where the detail is sharp and the transition between the ODVS and high-resolution image is smooth.

III. TELEVIEWING TESTBED & EXPERIMENTAL VALIDATION In order to develop and test these powerful algorithms, we have created a research testbed of video clusters that have been the focus of recent work in the CVRR. First we will briefly discuss the sensor infrastructure that is installed on the UCSD campus. [1] Secondly we will discuss the creation of a mobile video probe for data acquisition. Finally we will discuss the validation of the algorithms mentioned in section II of this paper using the testbed. We currently have two sets of sensor clusters on campus at UCSD. They are located on streetlights near an intersection on campus where by which buses, cars, bikes, and people regularly pass. This is an excellent location to test the algorithms under a variety of conditions. Each sensor cluster contains a high-speed pan/tilt rectilinear camera mounted in a weatherproof housing and an ODVS, providing a continuous 360° view of the area surrounding the pole (pictured in Figure 5). All of these cameras are wired directly back to the lab using fiber optics, a one Gigabit Cisco switch, and some AXIS video server units which together give us the capability of sixteen high bandwidth bi-directional live feeds back to the CVRR lab.

Figure 5: UCSD base node capable of acquiring 16 video

streams over a 1 Gigabit fiber optic link

Creating outdoor sensor clusters as part of the permanent infrastructure has a lot of utility, however it is not a simple task. Additionally, once a cluster is mounted and operational it is difficult to change the location or perspective of that cluster if further research determines a different placement would be more optimum. To enable video cluster data acquisition from more than just the permanent mountings, we determined there was a need for a mobile unit with the capability of taking on-demand sensor data at heights similar to roadside streetlights. This system can be deployed relatively quickly and easily at nearby locations to respond to a need for diverse data to test our algorithms.

Figure 6: Mobile unit for video capture and

communication is used for acquiring traffic data

The established mobile system (pictured in Figure 6) is comprised of a ~12’ aluminum pole that is mounted on an electric golfcart and stabilized using chains attached to the cart frame. A camera mount is fixed on top for attachment of the omni-directional video sensor (ODVS) or rectilinear camera.

(a)

(b)

(c)

Figure 7: An outdoor ODVS image and the generated perspective views. (a) The source ODVS image. (b) GUI for perspective view generation. (c) Perspective views on different pan, tilt, and zoom values.

Onboard the mobile system, the video is stored by a ruggedized PC with dual capture cards and a SCSI hard drive array onboard the cart, or transmitted via wireless video link to the lab. An inverter that draws power from the main cart batteries powers any necessary peripherals. This mobile video probe has been instrumental in preparing visualizations of outdoor traffic scenes and collecting traffic video data.

This powerful testbed which has been constructed allows us to utilize the algorithms discussed in section II to examine the results. Figure 7 shows a source ODVS image and the generated perspective views on different pan, tilt, and zoom values. The outdoor ODVS video is compressed in MJPEG, transmitted through the Internet, and received and decoded on PC. The perspective video is then generated as in Figure 7(b). All the views are produced in real-time, showing the feasibility of ODVS televiewing. Figure 8 is an example image showing the progress and results from the integration of a high-resolution rectilinear image with an ODVS view. This software enables the user to look around the area where ODVS is located in a “virtual” pan/tilt fashion. Using this interface, the user can view the high-resolution images in context of their background, creating a more immersive televiewing experience.

Figure 8: Results of seamlessly integrating a high

resolution image and an omni-directional image. The GUI allows an observer to explore remote sites.

IV. ACKNOWLEDGEMENTS Our research is supported in part by the California Digital Media Innovation Program (DiMi) in partnership with the California Department of Transportation (Caltrans). We wish to thank our colleagues from the CVRR who are also involved in related research activities. We would specifically like to thank Rick Capella for his contributions.

V. REFERENCES [1] M. Trivedi, I. Mikic, G. Kogut. “Distributed Video Networks for

Incident Detection and Management”, 3rd IEEE Conference on Intelligent Transportation Systems, October 2000, Dearborn.

[2] I. Mikic, P. Cosman, G. Kogut, M. Trivedi. “Moving Shadow and

Object Detection in Traffic Scenes”, 15th International Conference on Pattern Recognition, September 2000, Barcelona, Spain.

[3] M. Trivedi, S. Bhonsle, A. Gupta. “Database Architecture for

Autonomous Transportation Agents for On-scene Networked Incident Management (ATON)”, 15th International Conference on Pattern Recognition, September 2000, Barcelona, Spain.

[4] R. Capella. " Real-Time System for Integrated Omni-Directional

and Rectilinear Image-based Virtual Walkthroughs," Masters Thesis, Electrical Engineering, University of California San Diego, 1999.

[5] H. Ishiguro. "Development of low-cost and compact omnidirectional vision sensors and their applications," Proc. Int. Conf. Information systems, analysis and synthesis, pp 433-439, 1998.

[6] J. Miura, T. Kanda, and Y. Shirail, "An Active Vision System for

Real-Time Traffic Sign Recognition", Proc. of IEEE Int'l Conf. on Intelligent Transportation System 2000 (ITSC '00), pp. 52-57

[7] S. Nayar, “Catadioptric Omnidirectional Camera,” Proc. IEEE

Conf. on Comp. Vis. And Patt. Recog., pp. 482-488, Jun. 1997. [8] S. Baker and S. Nayar, “A Theory of Catadioptric Image

Formation,” Proc. 6th Int’l. Conf. on Comp. Vis., pp. 35-42, Jan. 1998.

[9] Y. Onoe, N. Yokoya, K. Yamazawa, and H. Takemura, “Visual

Surveillance and Monitoring System Using an Omnidirectional Video Camera,” Proc. IEEE 14th Int’l. Conf. on Patt. Recog., pp. 588-592, Aug. 1998.

[10] K. Ng, H. Ishiguro, and M. Trivedi, "Multiple Omni-Directional

Vision Sensors (ODVS) based Visual Modeling Approach," Proc. IEEE Visualization '99, Oct. 1999.

[11] K. Ng, H. Ishiguro, M. Trivedi, T. Sogo. “Monitoring

Dynamically Changing Environments by Ubiquitous Vision System”, IEEE Visual Surveillance Workshop, Fort Collins, Colorado, June 1999

[12] R. Tsai, “A Versatile Camera Calibration Technique for High-

Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses,” IEEE J. of Robotics and Automation, Vol. RA-3, No. 4, pp. 323-344, Aug. 1987.

[13] Trivedi, M.; Hall, B; Kogut, G.; Roche, S. “Web-based

teleautonomy and telepresence”, Proc. SPIE Vol. 4120, p. 81-85, Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation III, Aug. 2000, San Diego, CA

[14] B.D. Stewart, I.A.D. Reading, M.S. Thomson, C.L. Wan, and

T.D. Binnie, "Directing attention for traffic scene analysis", Proc. of IEE Int'l Conf. on Image Processing and its Applications 1995 (IPA '95), pp. 801-805

[15] C. BenAbdelkader, P. Burlina, and L. Davis, "Single camera

multiplexing for multi-target tracking", Proc. of IEE Int'l Conf. on Image Processing and its Application 1999 (IPA '99), pp. 1140-1143

a televiewing system for multiple simultaneous customized...

Documents