[ieee 2009 24th international conference image and vision computing new zealand (ivcnz) -...
TRANSCRIPT
Current Work in the .enpeda.. Project
John Morris, Ralf Haeusler, Ruyi Jiang, Khurram Jawed, Rateesh Kalarot, Tariq Khan, Waqar Khan,Sathiamoorthy Manoharan, Sandino Morales, Tobi Vaudrey, Jurgen Wiest, and Reinhard Klette
The University of Auckland, The .enpeda.. Project
Auckland, New Zealand
Abstract—The Environment Perception and Driver Assistance(.enpeda..) project searches for solutions for vision-based driverassistance systems (DAS) which are currently starting to be activesafety components of cars (e.g., lane departure warning, blindspot supervision). We review current projects in .enpeda.. in theinternational context of developments in this field.
I. VISION-BASED DAS
Safety systems, that perceive the environment around them
and act accordingly, are the next step to assure safe driving
conditions. Cameras and computer vision offer flexible active
safety systems and first solutions are already on the market
(e.g., lane departure warning, blind spot supervision). Driverassistance systems (DAS) are developed to predict traffic
situations, adapt driving and car to current traffic situations
and optimize for safety. Vision-based DAS uses one or more
cameras to capture the environment and help achieve these
goals.
Correspondence techniques (stereo or motion analysis) are
designed for providing the basic (i.e., low-level) information
for more advanced DAS solutions (e.g., 3D lane modeling,
ego-motion analysis, tracking of pedestrians or cars or un-
derstanding complex traffic situations). Ego-motion describes
the movements of the ego-vehicle, which is the car the given
system is operating in.
Figure 1 illustrates the concept of prediction. At the moment
when the ego-vehicle is crossing lanes, borders of lanes are
identified and the corridor defined by the lanes is predicted,
showing the space the ego-vehicle should drive through in the
next few seconds.
This paper is organized as follows: Section 2 points to
current algorithms used or evaluated for correspondence anal-
ysis in vision-based DAS. Section 3 discusses how stereo
and motion analysis merges into scene flow. This is followed
by two sections on algorithm performance evaluation (cor-
respondence algorithms in Section 4 and the preparation of
a general testbed in Section 5). These evaluations lead to
improvements in techniques and Section 6 gives one example.
Section 7 discusses lane and corridor detection, Section 8 ego-
motion estimation and Section 9 pedestrian detection - as three
Fig. 1. Changing lanes (left), identified lane borders (middle), and predictedcorridor (right) [23].
examples of higher-level analysis tasks in current vision-based
DAS. Section 10 concludes.
II. CORRESPONDENCE ANALYSIS
Correspondence analysis is used in stereo or motion analysis
in a 1D (stereo), or 2D (motion analysis) search space to
identify pairs of corresponding pixels, either in a multi-camera
system (stereo) or in a time-sequence of images (motion).
Vision-based DAS involves concurrent stereo and motion
analysis.
Motion information is a key feature in image sequences of
dynamic environments such as traffic scenes. Dense motion
estimation is more commonly referred to as optic flow. There
have been many approaches proposed for extracting 2D optic
flow, both dense ([6], [57]) and sparse ([30], [50]) estimation
have been well covered. Figure 2 shows a flow map.
Stereo has been exhaustively studied for single image pairs
(state of the art [12], [17]). Current stereo algorithms (see,
e.g., [27]) are basically variations of three different concepts:
scan-line optimization, belief propagation or graph cut. Dy-
namic programming stereo and semi-global matching are two
versions of scan-line optimization. Figure 2 shows a sample
depth map.
Both stereo and motion algorithms typically rely on the
intensity constancy assumption (ICA) (i.e., assuming that
object pixels will have the same intensity in the left and right
image, or in two subsequent images).
Fig. 2. A synthetic sequence of 396 stereo frames [11] for testingcorrespondence algorithms. Upper row: a left (grey) and right (color) view.Bottom row: false colour flow and depth maps.
978-1-4244-4698-8/09/$25.00 ©2009 IEEE
24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)
- 130 -
Dense stereo analysis running in real-time has recently been
achieved in our laboratories. Using two 1024 × 768 pixel
cameras connected directly to an FPGA in a host, we have
been able to generate depth and occlusion maps at 30 frames
per second for a disparity range of 128 [21], [37]. This system
implements a variation of a symmetric dynamic programming
stereo algorithm [16] which has a particularly compact and
fast hardware realization leading to the high performance.
An example showing that system catching a ball in flight is
shown in Figure 3. Note that the simple (two-bit) occlusion
maps neatly outline objects in the scene: they can be used
to enable high level tasks such as contour fitting to proceed
in real time [26]. Graphical processing units (GPU) are also
able to provide real time performance [40] but current systems
are limited by bus congestion for the large images needed to
provide accurate depths.
Fig. 3. Left: Original left image showing the capture of a ball in flight at 30fps. Right, upper: Disparity map. Right, lower: Occlusion map generated bySDPS hardware. Note that, as a consequence of the cyclopean view used bySDPS, the generated disparity and occlusion maps are twice as wide as theoriginal left image.
III. SCENE FLOW: FUSION OF OPTIC FLOW AND STEREO
Scene flow estimates both 3D motion and position. Recent
work uses stereo cameras to record an image sequence,
providing both stereo and sequential image data. One can now
compute scene flow as there is no missing information - except
occlusions.
�������� �� �����
����������
������
��� �����
��������
���������
�� �����
��� ��� �������� ���
� � � �
Fig. 4. Left: scene flow constraint, showing how a moving pixel is depictedin a stereo image sequence. Right: sample vector results, green = stationary,red = moving fast.
One of the most robust methods for sparse (feature based)
scene flow, called 6D-Vision, uses extended Kalman filters to
estimate and provide robust results for 3D motion and 3D
position over time [13]. This approach runs in real-time (25
fps) and is robust to a large range of noise.
Dense scene flow was first formalized in 1997 [41] but not
advanced until 2001 [60]. The latest algorithms [20], [58]
focus on a variational approach (incorporating smoothness)
to solve the dense energy equation (based on ICA), using
two sequential stereo pairs. From this information alone, a
3D scene flow estimate can be achieved. The primary differ-
ence between [20] and [58], is that the first estimates both
motion and distance in a coupled framework, while the latter
decouples the motion and distance estimation, yielding faster
and more accurate results. This is demonstrated in Figure 4
(left); this figure (right) shows a sample output of the method
from [58].
Dense scene flow still lacks quality and does not use robust
filters (as [13]) and the fastest algorithm [58] runs at only 3
fps on a CPU (10 fps on a GPU). There is much scope for
further research: it is important for dynamic environments and
moving platforms.
IV. PERFORMANCE EVALUATION
Performance evaluation of stereo or motion algorithms
plays an essential role in the design of DAS; both kinds of
algorithms are of basic importance for a proper understanding
of the surroundings of the vehicle and, therefore, of the
correctness in the detection of possible hazards. An evaluation
should take all the possible situations (rain, night, sun strike,
etc.) into account that may occur while driving.
Fig. 5. Different results of a BP stereo algorithm on images recorded lessthan a second apart. Left and center: left and right input images, respectively.Right: disparity or depth maps where intensity decreases with distance.
In the past few years, performance evaluation followed
standards for stereo [43] and for motion [5] algorithms. Both
methodologies use data sets of reduced length (no more than
six images) consisting of synthetic or engineered images.
Synthetic images are produced with ray-tracing software;
engineered images are captured in laboratories under highly
controlled conditions. The main reason for using indoor data
sets is that the performance of algorithms is evaluated by
comparing their respective outputs with (relatively) easily
obtained ground truth. The computer vision community widely
accepted this methodology as a useful tool to evaluate corre-
spondence algorithms, and this supported good progress in
the field. However, the performance of such algorithms varies
if input images are not generated under optimum conditions
(e.g., due to violations of the ICA in DAS; see [36]). Figure 5
presents results for a belief propagation (BP) stereo algorithm
[12] for stereo pairs recorded within the same second. The
24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)
- 131 -
disparity map in the upper row still allows us to detect
‘structure’ in the input images. But the disparity map in the
lower row is useless. Stereo algorithms need to be tested
over real-world data that represent situations which may occur
during driving. For similar comments for motion algorithms,
see [52].
A major drawback for testing algorithms over real world
data is that it is almost impossible to generate ground truth.
[42] proposed a prediction error technique to evaluate the
performance of stereo or motion algorithms in the absence of
ground truth. This procedure was generalized in our work from
image triples in stereo or motion to a prediction error analysis
of long image sequences. For the stereo case it requires a
recording of at least three image sequences: two of them are
used for disparity calculation, and the third for evaluation. [35]
uses a geometric approach of the prediction error technique
to evaluate the performance of several stereo algorithms over
long real-world stereo sequences. Using the calculated dis-
parity and the relative position of the third camera, a virtualimage is generated which predicts the appearance of an image
if captured at the pose of the third camera. The normalized
cross correlation (NCC) measure is used for comparing virtual
and third images of a sequence. See Figure 6 for examples of
virtual views and actual images recorded with the third camera.
Fig. 6. Recorded images (left) and corresponding virtual views (right) gen-erated with BP disparity maps illustrated in corresponding rows in Figure 5.
Further research is required to compare the accuracy and
relevance of prediction error analysis with other evaluation
techniques, such as based on a photo-realistic [25] and physics-
realistic [28] virtual world as discussed in the next section.
V. MODELING 3D ENVIRONMENTS FOR GROUND TRUTH
Figure 2 illustrates currently available data, where the
sequences are rendered in a completely synthetic world.
Figure 7 demonstrates another approach: a real-world area
is 3D modeled (modeling of vegetation or very fine detail
is omitted), and a test vehicle (such as HAKA1) drives in
this modeled area, aiming at using ground truth (calculated
from the model) for the given trajectory of the test vehicle.
A problem here is: how to achieve sufficient accuracy in
calculating this trajectory (with reference to the generated
3D model) for ensuring meaningful evaluation. Based on our
various attempts to estimate this trajectory accurately (e.g.,
motion from structure, SIFT-based matching, or LIDAR-based
matching) we realized that this is a very hard problem. The
approach still seems to be worth to be followed, and the
biggest problem of this approach is to model the trajectory
of the ego-vehicle such that the application of common error
metrics [35] is justified for correctness evaluation.
The precision of estimates for the yaw component of ego-
motion is unlikely to come close to this: micro-mechanical
inertial sensors often have drifts of not less than 0.1 degrees
per second. Magnetic or even external optic tracking are not
feasible at large scales, such as of relevance in a driver
assistance context. Finally, structure from motion approaches
[2], [7], [10] suffer from the same bias as the algorithms to be
tested - this is due to the ill-posed problem of feature detection
inherent to both. Thus, the larger the challenge of a situation
for driver assistance (due to a high possibility that algorithms
will fail), the more unreliable the results of structure from
motion approaches for obtaining robust ego-trajectory data.
This suggests that there is not really any potential for a
substantial improvement.
Thus, we decided to follow another path for ground-truth-
based testing of vision algorithms in large-scale environments.
Current tools (e.g., LuxRender [32]) make simulation of the
image formation process possible.
The big advantages of a simulation of the image formation
process are: single parameters of the image generation process
can be altered and effects on algorithm performance can
be studied. This is not possible with real world imagery!
Furthermore, geometric alignment of ground truth is ideal.
VI. GENERALIZED RESIDUAL IMAGES FOR DAS
Illumination artifacts, such as major differences in bright-
ness, cause some of the most major problems with correspon-
dence algorithms (i.e., stereo and optic flow estimators [18],
[36]). The primary reason that this is effective, is because both
motion and stereo typically rely on the ICA. Therefore, there
needs to be a general way of removing this type of noise.
This problem is not so important for synthetically generated
data with very low noise components, such as the datasets
provided in [5], [43]. Very good results can be obtained as the
differences between corresponding images is subtle. This can
be demonstrated by taking two corresponding images (e.g., t
Fig. 7. Modeled architecture: frame as captured from HAKA1 (left), a simplerender of a 3D model seen with a pose close to that of the image (middle),and corresponding depth map (right).
24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)
- 132 -
Fig. 8. Example of illumination difference using RubberWhale [5] (a goodscene, top) and Art [18] (differing exposures, bottom). Columns (left to right):sample original image, error between warped original images, sample residualimage, error between warped residual images. Error image color key: white= no difference, black = maximum intensity difference (out of 255, i.e., 8-bitimage). Maximum for RubberWhale is 166 (original) and 143 (residual), andfor Art is 212 (original) and 64 (residual).
and t+1 for optic flow), warping the first image to the second
(using ground truth) and measuring the intensity difference.
Examples of this can be seen in Figure 8. The RubberWhale[5] scene shows how, with near perfect scenes, the intensity
consistency assumption holds relatively true. However, for
realistic applications (such as DAS), the illumination artifacts
can be much worse. Even slight changes in exposure can
cause a big impact on the illumination difference. The Art[18] scene highlights this by having the left and right images
with different exposure settings. The results are obvious in
Figure 8.
One way of dealing with such illumination differences splits
an image into its texture and structure components [1]. This
takes an image f(x) and creates a smooth (structure) image
s(x) - the remaining image will be the residual (texture)
image r(x). This process has been more formally generalized
[52], showing that any smoothing algorithm, repeated for niterations, can be used to remove illumination artifact partially.
Residual images improve results for both stereo [18] and optic
flow [53], [57] - see Figure 8 (right columns).
VII. LANE AND CORRIDOR DETECTION
Road space in front of the ego-vehicle is of great importance
for navigation and driver assistance. Using a lane is the most
popular way to model the road, defined as an area on the road
surface that is completely confined by road boundaries or lane
marks. Another modelling strategy is using free-space, which
is the road region in front of the ego-vehicle where navigation
without collision is guaranteed [3], [4]. Similarly, terrain is
usually analyzed in navigation in order to find a way to drive
through an environment [49]. A corridor is a combination of
lane and expected trajectory of the ego-vehicle. It predicts the
road area for the ego-vehicle in the next several seconds.
Lane detection plays a significant role in DAS, as it can
help estimate the geometry of the road ahead, as well as the
lateral position of the ego-vehicle on the road. A recent review
on lane detection can be found in [34]. Though there are even
some commercial lane detection systems available (mostly on
highway), it is a challenging task to robustly detect lanes in
varying situations, especially in complex urban environment.
By applying the lane model of [22], a lane is identified by
two lane boundaries. A particle filter is applied to track points
Fig. 9. Experimental results for lane detection. Left: input images. Middle:lanes detected in the birds-eye image (green). The centerline of a lane is alsomarked (red). Right: lanes detected in input images.
along those boundaries in the birds-eye image. This model
does not use any assumption about lane geometry, and applies
to all shapes of lanes.
The row component of an Euclidean distance transform is
used, instead of an edge, as the information source for lane
detection. With this distance value, the initialization of the
particle filter, and lane tracking become much easier. Some
experiment results are shown in Figure 9.
There are some limitations in lane detection. Challenges
occur when lane boundaries cannot be properly detected or
hypothesized. For example, when the ego-vehicle is changing
lanes, there are two lanes in front of the vehicle, and the
ego-vehicle is driving partially on each of them. The limited
value of lane detection in DAS is due to its single source of
information, as only physical lane marks are utilized without
considering the ego-vehicle’s state. For the detection of a
corridor we combine the ego-vehicle’s motion state with the
estimated lane boundary. When the predicted road patch of
constant width hits a road boundary or lane mark at an
intersection point (see intersection point in Figure 10), then
it will smoothly bend accordingly, to follow those physical
features defined by the minimum deviation from the original
direction. In this way, the corridor is partly decided by
physical road features, and partially by the state of the ego-
vehicle. A hypothesis-testing based corridor detection method
is introduced in [23]. Some results are shown in Figure 10.
VIII. DIRECT METHOD FOR EGO-MOTION ESTIMATION
Generally, optic flow, feature matching or direct methods
are the three basic options for motion estimation in driver
assistance. Optic flow (continuous, e.g., [51]) and feature
matching (discrete, e.g., [2]) can be grouped as “correspon-
dence between points”, while direct methods use all of image
data in a two-image sequence, and do not attempt to establish
correspondences between the images. Hence a direct method
is relatively robust to quantization error noise, illumination
differences, and other effects [19].
24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)
- 133 -
Fig. 10. Experiment of corridor detection when the ego-vehicle is changinglanes (from left to right, top to bottom).
Based on a constraint equation in [19], a closed-form
solution for scenes with a planar surface is discussed in [38].
Furthermore, the estimation of ego-motion parameters is based
on two monocular sequences in [59], a pair of images from
a stereo rig in [45], three frames in [48], or stereo sequences
in [46]. In [47], a direct method based motion estimation for
driver assistance is discussed, and robust results are shown.
[47] and [39] pointed out that in typical traffic scenarios, the
direct method performs better than any (tested) optic flow-
based method. Still, ego-motion estimation based on the direct
method is not yet at the state of being robust (i.e., for varying
situations), or practical (e.g., real time). GPS data [33] may
also be further fused with ego-motion data in the future.
IX. PEDESTRIAN DETECTION
With the event of DAS, interest in pedestrian detection
systems was growing fast. Nevertheless, pedestrian detection
and tracking is still a challenging problem in ordinary driving
situations. The problem only gets worse as pedestrian detection
should perform robustly for all types of situations, such as un-
der variable illumination conditions, variable rotated positions
and pose (including crouching), and even if pedestrians are
partially occluded.
Since [55] proposed a boosted cascade of Haar-like wavelet
features for face detection, many authors applied this machine
learning approach for object and pedestrian detection. [8] used
this classification technique to detect pedestrians in single
images with an extended pool of Haar-like wavelet features.
[56] further improved previous work by integrating intensity
and motion information for walking person detection, [24]
extended this work further by using many more frames as
input to the pedestrian detector (in surveillance scenarios). [54]
uses local differences of histograms of oriented gradients in
combination with a boosting algorithm to detect pedestrians.
[9] uses 3D Haar-like features by defining seven types of
volume filters to be able to represent a pedestrian’s appearance,
but also to capture the motion information.
[29] fused detection and space-time trajectory estimation
into a coupled optimization problem. It is shown that the
detection and estimation procedure can be formulated as
individual quadratic boolean problem, which can be combined
into a coupled optimization problem. The resulting hypothesis
selection procedure always searches for the best explanation
of the current image and all previous observations.
A multi-cue vision system for real-time detection and track-
ing of pedestrians, called PROTECTOR [15], was developed in
collaboration with Daimler AG. The detection system involves
a cascade of modules to decrease the image search space. After
a stereo based ROI (region of interest) generation, a shape-
based detection technique is used with trained templates using
the Daimler Pedestrian Detection Benchmark Data Set [14].
In our pedestrian detection system we apply the scale-invariant feature transform (SIFT) of [30], [31]. The original
SIFT-based object detection scheme is, however, not suitable
for the problem of pedestrian detection. Motivated by [44], we
divide a pedestrian’s body into sub-regions, and use indepen-
dent classifiers for the simple reason that a single classifier
has only to learn individual local features of regions.
After estimating the position in 3D space, we apply a linear
multi-model Kalman filter to track the pedestrian’s velocity
and position. This was crucial in increasing the accuracy of
our detection. In contrast to other works, we benefit from the
original SIFT approach by matching key points for the object
of interest over several time steps, and, hopefully, gain a high
accuracy of a pedestrian’s lateral movement and reduce thus
the influences of a moving camera (e.g., egomotion or blur).
X. CONCLUSIONS
This paper highlights current projects and challenges in the
.enpeda.. project. Motion or stereo analysis define the basic
steps of generating supporting low-level data for obtaining
proper interpretations of traffic environments. Due to the
mentioned need of developing active safety systems for cars,
vision-based DAS is the future.
REFERENCES
[1] Aujol, J. F., Gilboa, G., Chan, T., and Osher, S.: Structure-texture imagedecomposition - modeling, algorithms, and parameter selection. Int. J.Computer Vision, 67:111-136 (2006)
[2] Badino, H.: A robust approach for ego-motion estimation using a mobilestereo platform. In Proc. Int. Workshop Complex Motion, pages 198–208(2007)
[3] Badino, H., Franke, U., and Mester, R.: Free space computation usingstochastic occupancy grids and dynamic programming. In Proc. Work-shop on Dynamical Vision, IEEE Int. Conf. Computer Vision (2007)
[4] Badino, H., Mester, R., Vaudrey, T., and Franke, U.: Stereo-based freespace computation in complex traffic scenarios. In Proc. IEEE SouthwestSymp. Image Analysis Interpretation, pages 189–192 (2008)
[5] Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., andSzeliski, R.: A database and evaluation methodology for optical flow. InProc. IEEE Int. Conf. Computer Vision, pages 1–8 (2007)
[6] Brox, T., Bruhn, A., Papenberg, N., and Weickert, J.: High accuracyoptical flow estimation based on a theory for warping. In Proc. EuropeanConf. Computer Vision (ECCV), pages 25–36 (2004)
[7] Bundler: Structure from motion for unordered image collections. http://phototour.cs/washington.edu/bundler/, (2009)
[8] Chen, Y., and Chen, C.: A cascade of feed-forward classifiers for fastpedestrian detection. In Proc. Asian Conf. Computer Vision, pages 905–914 (2007)
[9] Cui, X., Liu, Y., Shan, S., Chen, X., and Gao, W.: 3D Haar-like featuresfor pedestrian detection. In Proc. IEEE Int. Conf. Multimedia Expo,pages 1263–1266 (2007)
24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)
- 134 -
[10] Domke, J., and Aloimonos, Y.: A probabilistic notion of correspondenceand the epipolar constraint. In Proc. Int. Symp. 3D Data ProcessingVisualization Transmission, pages 41–48 (2006)
[11] EISATS (.enpeda.. Image Sequence Analysis Test Site), see http://www.mi.auckland.ac.nz/EISATS (2009)
[12] Felzenszwalb, P. F., and Huttenlocher, D.P.: Efficient belief propagationfor early vision. Int. J. Computer Vision, 70:41–54 (2006)
[13] Franke, U., Rabe, C., Badino, H., and Gehrig, S. 6D-vision: fusion ofstereo and motion for robust environment perception. In Proc. PatternRecognition - DAGM, pages 216–223 (2005)
[14] Gavrilla, D.: Daimler pedestrian benchmark data set. Follow ‘Lookingat people’ on http://www.gavrila.net/Research/research.html (2009)
[15] Gavrilla, D.M., and Munder., S.: Multi-cue pedestrian detection andtracking from a moving vehicle. Int. J. Computer Vision, 73:41–59(2007)
[16] Gimel’farb, G. L.: Probabilistic regularisation and symmetry in binoculardynamic programming stereo. Pattern Recognition Letters, 23:431–442(2002)
[17] Hirschmuller, H.: Stereo processing by semiglobal matching and mu-tual information. IEEE Trans. Pattern Analysis Machine Intelligence,30:328–341 (2008)
[18] Hirschmuller, H., and Scharstein, D.: Evaluation of stereo matching costson images with radiometric differences. IEEE Trans. Pattern AnalysisMachine Intelligence, to appear (2009)
[19] Horn, B.K.P., and Weldon, Jr., E. J.: Direct methods for recoveringmotion. Int. J. Computer Vision, 2:51–76 (1988)
[20] Huguet, F., and Devernay, F.: A variational method for scene flowestimation from stereo sequences. In Proc. IEEE Int. Conf. ComputerVision, pages 1–7 (2007)
[21] Jawed, K., Morris, J., Khan, T., and Gimel’farb, G.: Real time rectifica-tion for stereo correspondence. In Proc. IEEE/IFIP Int. Conf. EmbeddedUbiquitous Computing, to appear (2009)
[22] Jiang, R., Klette, R., Wang, S., and Vaudrey, T.: New lane model anddistance transform for lane detection and tracking. In Proc. Int. Conf.Computer Analysis Images Patterns, to appear (2009)
[23] Jiang, R., Klette, R., Wang, S., and Vaudrey, T.: Ego-vehicle corridorsfor vision-based driver assistance. In Proc. Int. Workshop CombinatorialImage Analysis, to appear (2009)
[24] Jones, M.J., and Snow, D.: Pedestrian detection using boosted featuresover many frames. In Proc. Int. Conf. Pattern Recognition, pages 1–4(2008)
[25] Kajiya, J.T.: The rendering equation. In Proc. SIGGRAPH, pages 143–150 (1986)
[26] Khan, T., Morris, J., and Jawed, K.: Intelligent vision for mobile agents:Contour maps in real time. Submitted paper (2009)
[27] Klette, R., Jiang, R., Morales, S., and Vaudrey, T.: Discrete driverassistance. In Proc. ISMM (M.H.F. Wilkinson and J.B.T.M. Roerdink,eds.), LNCS 5720, pages 1–12, Springer, Berlin (2009)
[28] Lafortune, E.: Mathematical models and Monte Carlo algorithms forphysically based rendering. Dept. Computer Science, Faculty of Engi-neering, Katholieke Universiteit Leuven (1996)
[29] Leibe, B., Schindler, K., Cornelis, N., and Van Gool, L.: Coupled objectdetection and tracking from static cameras and moving vehicles. IEEETrans. Pattern Analysis Machine Intelligence, 30:1683–1698 (2008)
[30] Lowe, D.: Object recognition from local scale-invariant features. In Proc.IEEE Int. Conf. Computer Vision, pages 1150–1157 (1999)
[31] Lowe, D.: Distinctive image features from scale-invariant keypoints. Int.J. Computer Vision, 60:91–110 (2004)
[32] LuxRender. See http://www.luxrender.net/ (2009)[33] Manoharan, S.: On GPS tracking of mobile devices. In Proc. IEEE Int.
Conf. Networking Services, pages 415–418 (2009)[34] McCall, J.C., and Trivedi, M.M.: Video-based lane estimation and
tracking for driver assistance: survey, system, and evaluation. IEEETrans. Intelligent Transportation System, 7:20–37 (2006)
[35] Morales, S., and Klette, R.: A third eye for performance evaluation instereo sequence analysis In Proc. Int. Conf. Computer Analysis ImagesPattens ,to appear (2009)
[36] Morales, S., Woo, Y. W., Klette, R., and Vaudrey, T.: A study on stereoand motion data accuracy for a moving platform. In Proc. FIRA WorldCongress, to appear (2009)
[37] Morris, J., Jawed, K., and Gimel’farb, G.: Intelligent vision: A first step- real time stereo vision. In Proc. ACIVS, to appear (2009)
[38] Negahdaripour, S., and Horn, B.K.P.: Direct passive navigation. IEEETrans. Pattern Analysis Machine Intelligence, 1: 168–176 (1987)
[39] Qifa, K.: Transforming camera geometry to a virtual downward-lookingcamera: robust ego-motion estimation and ground-layer detection. InProc. IEEE Conf. Computer Vision Pattern Recognition, pages 390–397(2003)
[40] Park, S., and Jeong, H.: A high-speed parallel architecture for stereomatching. In Proc. ISVC, LNCS 4291, pages 334–342 (2006)
[41] Patras, I., Hendriks, E., and Tziritas, G.: A joint motion/disparityestimation method for the construction of stereo interpolated images instereoscopic image sequences. In Proc. Annual Conf. Advanced SchoolComputing Imaging (1997)
[42] Scharstein D.: Prediction Error as a Quality Metric for Motion andStereo. In Proc. IEEE Int. Conf. Computer Vision, pages 781–788 (1999)
[43] Scharstein, D., and Szeliski, R.: A taxonomy and evaluation of densetwo-frame stereo correspondence algorithms. Int. J. Computer Vision,47:7–42 (2002)
[44] Shashua, A., Gdalyahu, Y., and Hayon., G.: Pedestrian detection fordriving assistance systems: Single-frame classification and system levelperformance. In Proc. IEEE Intelligent Vehicles Symp. , pages 1–6 (2004)
[45] Shieh, J.Y., Zhuang, H., and Sudhakar, R.: Motion estimation from asequence of stereo images: a direct method. IEEE Trans. System ManCybernetics, 24:1044–1053 (1994)
[46] Stein, G.P.: Geometric and photometric constraints : motion and structurefrom three views. PhD-Thesis. Mass. Inst. Technology, Dept. ElectricalEngineering Computer Science (1998)
[47] Stein, G.P., Mano, O., and Shashua, A.: A robust method for computingvehicle ego-motion. In Proc. IEEE Intelligent Vehicle Symp., pages 362–368 (2000)
[48] Stein, G.P., and Shashua, A.: Model based brightness constraints: ondirect estimation of structure and motion. In Proc. IEEE Conf. ComputerVision Pattern Recognition, pages 992–1015 (1997)
[49] Thrun, S., Montemerlo, M., and Aron, A.: Probabilistic terrain analysisfor high-speed desert driving. In Proc. Robotics Science Systems Conf.(2006)
[50] Shi, J., and Tomasi, C.: Good features to track. In Proc. IEEE Conf.Computer Vision Pattern Recognition, pages 593–600 (1994)
[51] Tsao, A.T., Fuh, C.S., Hung, Y.P., and Chen, Y.S.: Ego-motion estima-tion using optical flow fields observed from multiple cameras. In Proc.Computer Vision Pattern Recognition, pages 457–462 (1997)
[52] Vaudrey, T., and Klette, R.: Residual images remove illumination ar-tifacts for correspondence algorithms! In Proc. Pattern Recognition -DAGM, pages 472–481 (2009)
[53] Vaudrey, T., Wedel, A., Chen, C.-Y., and Klette, R.: Improving opticalflow using residual images. In Proc. Int. Conf. Arts Information Tech-nology, to appear (2009)
[54] Villamizar, M., Sanfeliu, A., and Andrade-Cetto, J.: Local boosted fea-tures for pedestrian detection. In Proc. Iberian Conf. Pattern RecognitionImage Analysis, pages 128–135 (2009)
[55] Viola, P., and Jones, M.: Rapid object detection using a boosted cascadeof simple features. In Proc. Conf. Computer Vision Pattern Recognition,pages 511–518 (2001)
[56] Viola, P., Jones, M., and Snow, D.: Detecting pedestrians using patternsof motion and appearance. In Proc. IEEE Int. Conf. Computer Vision,volume 2, pages 734–741 (2003)
[57] Wedel, A., Pock, T., Zach, C., Bischof, H., and Cremers, D.: Animproved algorithm for TV-L1 optical flow. Visual Motion Analysis (D.Cremers, B. Rosenhahn, and A. Yuille, eds.), pages 23–45 (2009)
[58] Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., and Cremers,D.: Efficient dense scene flow from sparse or dense stereo data. In Proc.European Conf. Computer Vision, pages 739–751 (2008)
[59] Weng, J., and Huang, T. S.: Complete structure and motion from twomonocular sequences without stereo correspondence. In Proc. Int. Conf.Pattern Recognition, pages 651–654 (1992)
[60] Zhang, Y., and Kambhamettu, C.: On 3d scene flow and structureestimation. In Proc. IEEE Conf. Computer Vision Pattern Recognition,pages 778–785 (2001)
24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)
- 135 -