[ieee 2009 24th international conference image and vision computing new zealand (ivcnz) -...

6
Current Work in the .enpeda.. Project John Morris, Ralf Haeusler, Ruyi Jiang, Khurram Jawed, Rateesh Kalarot, Tariq Khan, Waqar Khan, Sathiamoorthy Manoharan, Sandino Morales, Tobi Vaudrey, J¨ urgen Wiest, and Reinhard Klette The University of Auckland, The .enpeda.. Project Auckland, New Zealand Abstract—The Environment Perception and Driver Assistance (.enpeda..) project searches for solutions for vision-based driver assistance systems (DAS) which are currently starting to be active safety components of cars (e.g., lane departure warning, blind spot supervision). We review current projects in .enpeda.. in the international context of developments in this field. I. VISION-BASED DAS Safety systems, that perceive the environment around them and act accordingly, are the next step to assure safe driving conditions. Cameras and computer vision offer flexible active safety systems and first solutions are already on the market (e.g., lane departure warning, blind spot supervision). Driver assistance systems (DAS) are developed to predict traffic situations, adapt driving and car to current traffic situations and optimize for safety. Vision-based DAS uses one or more cameras to capture the environment and help achieve these goals. Correspondence techniques (stereo or motion analysis) are designed for providing the basic (i.e., low-level) information for more advanced DAS solutions (e.g., 3D lane modeling, ego-motion analysis, tracking of pedestrians or cars or un- derstanding complex traffic situations). Ego-motion describes the movements of the ego-vehicle, which is the car the given system is operating in. Figure 1 illustrates the concept of prediction. At the moment when the ego-vehicle is crossing lanes, borders of lanes are identified and the corridor defined by the lanes is predicted, showing the space the ego-vehicle should drive through in the next few seconds. This paper is organized as follows: Section 2 points to current algorithms used or evaluated for correspondence anal- ysis in vision-based DAS. Section 3 discusses how stereo and motion analysis merges into scene flow. This is followed by two sections on algorithm performance evaluation (cor- respondence algorithms in Section 4 and the preparation of a general testbed in Section 5). These evaluations lead to improvements in techniques and Section 6 gives one example. Section 7 discusses lane and corridor detection, Section 8 ego- motion estimation and Section 9 pedestrian detection - as three Fig. 1. Changing lanes (left), identified lane borders (middle), and predicted corridor (right) [23]. examples of higher-level analysis tasks in current vision-based DAS. Section 10 concludes. II. CORRESPONDENCE ANALYSIS Correspondence analysis is used in stereo or motion analysis in a 1D (stereo), or 2D (motion analysis) search space to identify pairs of corresponding pixels, either in a multi-camera system (stereo) or in a time-sequence of images (motion). Vision-based DAS involves concurrent stereo and motion analysis. Motion information is a key feature in image sequences of dynamic environments such as traffic scenes. Dense motion estimation is more commonly referred to as optic flow. There have been many approaches proposed for extracting 2D optic flow, both dense ([6], [57]) and sparse ([30], [50]) estimation have been well covered. Figure 2 shows a flow map. Stereo has been exhaustively studied for single image pairs (state of the art [12], [17]). Current stereo algorithms (see, e.g., [27]) are basically variations of three different concepts: scan-line optimization, belief propagation or graph cut. Dy- namic programming stereo and semi-global matching are two versions of scan-line optimization. Figure 2 shows a sample depth map. Both stereo and motion algorithms typically rely on the intensity constancy assumption (ICA) (i.e., assuming that object pixels will have the same intensity in the left and right image, or in two subsequent images). Fig. 2. A synthetic sequence of 396 stereo frames [11] for testing correspondence algorithms. Upper row: a left (grey) and right (color) view. Bottom row: false colour flow and depth maps. 978-1-4244-4698-8/09/$25.00 ©2009 IEEE 24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009) - 130 -

Upload: reinhard

Post on 10-Mar-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2009 24th International Conference Image and Vision Computing New Zealand (IVCNZ) - Wellington, New Zealand (2009.11.23-2009.11.25)] 2009 24th International Conference Image

Current Work in the .enpeda.. Project

John Morris, Ralf Haeusler, Ruyi Jiang, Khurram Jawed, Rateesh Kalarot, Tariq Khan, Waqar Khan,Sathiamoorthy Manoharan, Sandino Morales, Tobi Vaudrey, Jurgen Wiest, and Reinhard Klette

The University of Auckland, The .enpeda.. Project

Auckland, New Zealand

Abstract—The Environment Perception and Driver Assistance(.enpeda..) project searches for solutions for vision-based driverassistance systems (DAS) which are currently starting to be activesafety components of cars (e.g., lane departure warning, blindspot supervision). We review current projects in .enpeda.. in theinternational context of developments in this field.

I. VISION-BASED DAS

Safety systems, that perceive the environment around them

and act accordingly, are the next step to assure safe driving

conditions. Cameras and computer vision offer flexible active

safety systems and first solutions are already on the market

(e.g., lane departure warning, blind spot supervision). Driverassistance systems (DAS) are developed to predict traffic

situations, adapt driving and car to current traffic situations

and optimize for safety. Vision-based DAS uses one or more

cameras to capture the environment and help achieve these

goals.

Correspondence techniques (stereo or motion analysis) are

designed for providing the basic (i.e., low-level) information

for more advanced DAS solutions (e.g., 3D lane modeling,

ego-motion analysis, tracking of pedestrians or cars or un-

derstanding complex traffic situations). Ego-motion describes

the movements of the ego-vehicle, which is the car the given

system is operating in.

Figure 1 illustrates the concept of prediction. At the moment

when the ego-vehicle is crossing lanes, borders of lanes are

identified and the corridor defined by the lanes is predicted,

showing the space the ego-vehicle should drive through in the

next few seconds.

This paper is organized as follows: Section 2 points to

current algorithms used or evaluated for correspondence anal-

ysis in vision-based DAS. Section 3 discusses how stereo

and motion analysis merges into scene flow. This is followed

by two sections on algorithm performance evaluation (cor-

respondence algorithms in Section 4 and the preparation of

a general testbed in Section 5). These evaluations lead to

improvements in techniques and Section 6 gives one example.

Section 7 discusses lane and corridor detection, Section 8 ego-

motion estimation and Section 9 pedestrian detection - as three

Fig. 1. Changing lanes (left), identified lane borders (middle), and predictedcorridor (right) [23].

examples of higher-level analysis tasks in current vision-based

DAS. Section 10 concludes.

II. CORRESPONDENCE ANALYSIS

Correspondence analysis is used in stereo or motion analysis

in a 1D (stereo), or 2D (motion analysis) search space to

identify pairs of corresponding pixels, either in a multi-camera

system (stereo) or in a time-sequence of images (motion).

Vision-based DAS involves concurrent stereo and motion

analysis.

Motion information is a key feature in image sequences of

dynamic environments such as traffic scenes. Dense motion

estimation is more commonly referred to as optic flow. There

have been many approaches proposed for extracting 2D optic

flow, both dense ([6], [57]) and sparse ([30], [50]) estimation

have been well covered. Figure 2 shows a flow map.

Stereo has been exhaustively studied for single image pairs

(state of the art [12], [17]). Current stereo algorithms (see,

e.g., [27]) are basically variations of three different concepts:

scan-line optimization, belief propagation or graph cut. Dy-

namic programming stereo and semi-global matching are two

versions of scan-line optimization. Figure 2 shows a sample

depth map.

Both stereo and motion algorithms typically rely on the

intensity constancy assumption (ICA) (i.e., assuming that

object pixels will have the same intensity in the left and right

image, or in two subsequent images).

Fig. 2. A synthetic sequence of 396 stereo frames [11] for testingcorrespondence algorithms. Upper row: a left (grey) and right (color) view.Bottom row: false colour flow and depth maps.

978-1-4244-4698-8/09/$25.00 ©2009 IEEE

24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)

- 130 -

Page 2: [IEEE 2009 24th International Conference Image and Vision Computing New Zealand (IVCNZ) - Wellington, New Zealand (2009.11.23-2009.11.25)] 2009 24th International Conference Image

Dense stereo analysis running in real-time has recently been

achieved in our laboratories. Using two 1024 × 768 pixel

cameras connected directly to an FPGA in a host, we have

been able to generate depth and occlusion maps at 30 frames

per second for a disparity range of 128 [21], [37]. This system

implements a variation of a symmetric dynamic programming

stereo algorithm [16] which has a particularly compact and

fast hardware realization leading to the high performance.

An example showing that system catching a ball in flight is

shown in Figure 3. Note that the simple (two-bit) occlusion

maps neatly outline objects in the scene: they can be used

to enable high level tasks such as contour fitting to proceed

in real time [26]. Graphical processing units (GPU) are also

able to provide real time performance [40] but current systems

are limited by bus congestion for the large images needed to

provide accurate depths.

Fig. 3. Left: Original left image showing the capture of a ball in flight at 30fps. Right, upper: Disparity map. Right, lower: Occlusion map generated bySDPS hardware. Note that, as a consequence of the cyclopean view used bySDPS, the generated disparity and occlusion maps are twice as wide as theoriginal left image.

III. SCENE FLOW: FUSION OF OPTIC FLOW AND STEREO

Scene flow estimates both 3D motion and position. Recent

work uses stereo cameras to record an image sequence,

providing both stereo and sequential image data. One can now

compute scene flow as there is no missing information - except

occlusions.

�������� �� �����

����������

������

��� �����

��������

���������

�� �����

��� ��� �������� ���

� � � �

Fig. 4. Left: scene flow constraint, showing how a moving pixel is depictedin a stereo image sequence. Right: sample vector results, green = stationary,red = moving fast.

One of the most robust methods for sparse (feature based)

scene flow, called 6D-Vision, uses extended Kalman filters to

estimate and provide robust results for 3D motion and 3D

position over time [13]. This approach runs in real-time (25

fps) and is robust to a large range of noise.

Dense scene flow was first formalized in 1997 [41] but not

advanced until 2001 [60]. The latest algorithms [20], [58]

focus on a variational approach (incorporating smoothness)

to solve the dense energy equation (based on ICA), using

two sequential stereo pairs. From this information alone, a

3D scene flow estimate can be achieved. The primary differ-

ence between [20] and [58], is that the first estimates both

motion and distance in a coupled framework, while the latter

decouples the motion and distance estimation, yielding faster

and more accurate results. This is demonstrated in Figure 4

(left); this figure (right) shows a sample output of the method

from [58].

Dense scene flow still lacks quality and does not use robust

filters (as [13]) and the fastest algorithm [58] runs at only 3

fps on a CPU (10 fps on a GPU). There is much scope for

further research: it is important for dynamic environments and

moving platforms.

IV. PERFORMANCE EVALUATION

Performance evaluation of stereo or motion algorithms

plays an essential role in the design of DAS; both kinds of

algorithms are of basic importance for a proper understanding

of the surroundings of the vehicle and, therefore, of the

correctness in the detection of possible hazards. An evaluation

should take all the possible situations (rain, night, sun strike,

etc.) into account that may occur while driving.

Fig. 5. Different results of a BP stereo algorithm on images recorded lessthan a second apart. Left and center: left and right input images, respectively.Right: disparity or depth maps where intensity decreases with distance.

In the past few years, performance evaluation followed

standards for stereo [43] and for motion [5] algorithms. Both

methodologies use data sets of reduced length (no more than

six images) consisting of synthetic or engineered images.

Synthetic images are produced with ray-tracing software;

engineered images are captured in laboratories under highly

controlled conditions. The main reason for using indoor data

sets is that the performance of algorithms is evaluated by

comparing their respective outputs with (relatively) easily

obtained ground truth. The computer vision community widely

accepted this methodology as a useful tool to evaluate corre-

spondence algorithms, and this supported good progress in

the field. However, the performance of such algorithms varies

if input images are not generated under optimum conditions

(e.g., due to violations of the ICA in DAS; see [36]). Figure 5

presents results for a belief propagation (BP) stereo algorithm

[12] for stereo pairs recorded within the same second. The

24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)

- 131 -

Page 3: [IEEE 2009 24th International Conference Image and Vision Computing New Zealand (IVCNZ) - Wellington, New Zealand (2009.11.23-2009.11.25)] 2009 24th International Conference Image

disparity map in the upper row still allows us to detect

‘structure’ in the input images. But the disparity map in the

lower row is useless. Stereo algorithms need to be tested

over real-world data that represent situations which may occur

during driving. For similar comments for motion algorithms,

see [52].

A major drawback for testing algorithms over real world

data is that it is almost impossible to generate ground truth.

[42] proposed a prediction error technique to evaluate the

performance of stereo or motion algorithms in the absence of

ground truth. This procedure was generalized in our work from

image triples in stereo or motion to a prediction error analysis

of long image sequences. For the stereo case it requires a

recording of at least three image sequences: two of them are

used for disparity calculation, and the third for evaluation. [35]

uses a geometric approach of the prediction error technique

to evaluate the performance of several stereo algorithms over

long real-world stereo sequences. Using the calculated dis-

parity and the relative position of the third camera, a virtualimage is generated which predicts the appearance of an image

if captured at the pose of the third camera. The normalized

cross correlation (NCC) measure is used for comparing virtual

and third images of a sequence. See Figure 6 for examples of

virtual views and actual images recorded with the third camera.

Fig. 6. Recorded images (left) and corresponding virtual views (right) gen-erated with BP disparity maps illustrated in corresponding rows in Figure 5.

Further research is required to compare the accuracy and

relevance of prediction error analysis with other evaluation

techniques, such as based on a photo-realistic [25] and physics-

realistic [28] virtual world as discussed in the next section.

V. MODELING 3D ENVIRONMENTS FOR GROUND TRUTH

Figure 2 illustrates currently available data, where the

sequences are rendered in a completely synthetic world.

Figure 7 demonstrates another approach: a real-world area

is 3D modeled (modeling of vegetation or very fine detail

is omitted), and a test vehicle (such as HAKA1) drives in

this modeled area, aiming at using ground truth (calculated

from the model) for the given trajectory of the test vehicle.

A problem here is: how to achieve sufficient accuracy in

calculating this trajectory (with reference to the generated

3D model) for ensuring meaningful evaluation. Based on our

various attempts to estimate this trajectory accurately (e.g.,

motion from structure, SIFT-based matching, or LIDAR-based

matching) we realized that this is a very hard problem. The

approach still seems to be worth to be followed, and the

biggest problem of this approach is to model the trajectory

of the ego-vehicle such that the application of common error

metrics [35] is justified for correctness evaluation.

The precision of estimates for the yaw component of ego-

motion is unlikely to come close to this: micro-mechanical

inertial sensors often have drifts of not less than 0.1 degrees

per second. Magnetic or even external optic tracking are not

feasible at large scales, such as of relevance in a driver

assistance context. Finally, structure from motion approaches

[2], [7], [10] suffer from the same bias as the algorithms to be

tested - this is due to the ill-posed problem of feature detection

inherent to both. Thus, the larger the challenge of a situation

for driver assistance (due to a high possibility that algorithms

will fail), the more unreliable the results of structure from

motion approaches for obtaining robust ego-trajectory data.

This suggests that there is not really any potential for a

substantial improvement.

Thus, we decided to follow another path for ground-truth-

based testing of vision algorithms in large-scale environments.

Current tools (e.g., LuxRender [32]) make simulation of the

image formation process possible.

The big advantages of a simulation of the image formation

process are: single parameters of the image generation process

can be altered and effects on algorithm performance can

be studied. This is not possible with real world imagery!

Furthermore, geometric alignment of ground truth is ideal.

VI. GENERALIZED RESIDUAL IMAGES FOR DAS

Illumination artifacts, such as major differences in bright-

ness, cause some of the most major problems with correspon-

dence algorithms (i.e., stereo and optic flow estimators [18],

[36]). The primary reason that this is effective, is because both

motion and stereo typically rely on the ICA. Therefore, there

needs to be a general way of removing this type of noise.

This problem is not so important for synthetically generated

data with very low noise components, such as the datasets

provided in [5], [43]. Very good results can be obtained as the

differences between corresponding images is subtle. This can

be demonstrated by taking two corresponding images (e.g., t

Fig. 7. Modeled architecture: frame as captured from HAKA1 (left), a simplerender of a 3D model seen with a pose close to that of the image (middle),and corresponding depth map (right).

24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)

- 132 -

Page 4: [IEEE 2009 24th International Conference Image and Vision Computing New Zealand (IVCNZ) - Wellington, New Zealand (2009.11.23-2009.11.25)] 2009 24th International Conference Image

Fig. 8. Example of illumination difference using RubberWhale [5] (a goodscene, top) and Art [18] (differing exposures, bottom). Columns (left to right):sample original image, error between warped original images, sample residualimage, error between warped residual images. Error image color key: white= no difference, black = maximum intensity difference (out of 255, i.e., 8-bitimage). Maximum for RubberWhale is 166 (original) and 143 (residual), andfor Art is 212 (original) and 64 (residual).

and t+1 for optic flow), warping the first image to the second

(using ground truth) and measuring the intensity difference.

Examples of this can be seen in Figure 8. The RubberWhale[5] scene shows how, with near perfect scenes, the intensity

consistency assumption holds relatively true. However, for

realistic applications (such as DAS), the illumination artifacts

can be much worse. Even slight changes in exposure can

cause a big impact on the illumination difference. The Art[18] scene highlights this by having the left and right images

with different exposure settings. The results are obvious in

Figure 8.

One way of dealing with such illumination differences splits

an image into its texture and structure components [1]. This

takes an image f(x) and creates a smooth (structure) image

s(x) - the remaining image will be the residual (texture)

image r(x). This process has been more formally generalized

[52], showing that any smoothing algorithm, repeated for niterations, can be used to remove illumination artifact partially.

Residual images improve results for both stereo [18] and optic

flow [53], [57] - see Figure 8 (right columns).

VII. LANE AND CORRIDOR DETECTION

Road space in front of the ego-vehicle is of great importance

for navigation and driver assistance. Using a lane is the most

popular way to model the road, defined as an area on the road

surface that is completely confined by road boundaries or lane

marks. Another modelling strategy is using free-space, which

is the road region in front of the ego-vehicle where navigation

without collision is guaranteed [3], [4]. Similarly, terrain is

usually analyzed in navigation in order to find a way to drive

through an environment [49]. A corridor is a combination of

lane and expected trajectory of the ego-vehicle. It predicts the

road area for the ego-vehicle in the next several seconds.

Lane detection plays a significant role in DAS, as it can

help estimate the geometry of the road ahead, as well as the

lateral position of the ego-vehicle on the road. A recent review

on lane detection can be found in [34]. Though there are even

some commercial lane detection systems available (mostly on

highway), it is a challenging task to robustly detect lanes in

varying situations, especially in complex urban environment.

By applying the lane model of [22], a lane is identified by

two lane boundaries. A particle filter is applied to track points

Fig. 9. Experimental results for lane detection. Left: input images. Middle:lanes detected in the birds-eye image (green). The centerline of a lane is alsomarked (red). Right: lanes detected in input images.

along those boundaries in the birds-eye image. This model

does not use any assumption about lane geometry, and applies

to all shapes of lanes.

The row component of an Euclidean distance transform is

used, instead of an edge, as the information source for lane

detection. With this distance value, the initialization of the

particle filter, and lane tracking become much easier. Some

experiment results are shown in Figure 9.

There are some limitations in lane detection. Challenges

occur when lane boundaries cannot be properly detected or

hypothesized. For example, when the ego-vehicle is changing

lanes, there are two lanes in front of the vehicle, and the

ego-vehicle is driving partially on each of them. The limited

value of lane detection in DAS is due to its single source of

information, as only physical lane marks are utilized without

considering the ego-vehicle’s state. For the detection of a

corridor we combine the ego-vehicle’s motion state with the

estimated lane boundary. When the predicted road patch of

constant width hits a road boundary or lane mark at an

intersection point (see intersection point in Figure 10), then

it will smoothly bend accordingly, to follow those physical

features defined by the minimum deviation from the original

direction. In this way, the corridor is partly decided by

physical road features, and partially by the state of the ego-

vehicle. A hypothesis-testing based corridor detection method

is introduced in [23]. Some results are shown in Figure 10.

VIII. DIRECT METHOD FOR EGO-MOTION ESTIMATION

Generally, optic flow, feature matching or direct methods

are the three basic options for motion estimation in driver

assistance. Optic flow (continuous, e.g., [51]) and feature

matching (discrete, e.g., [2]) can be grouped as “correspon-

dence between points”, while direct methods use all of image

data in a two-image sequence, and do not attempt to establish

correspondences between the images. Hence a direct method

is relatively robust to quantization error noise, illumination

differences, and other effects [19].

24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)

- 133 -

Page 5: [IEEE 2009 24th International Conference Image and Vision Computing New Zealand (IVCNZ) - Wellington, New Zealand (2009.11.23-2009.11.25)] 2009 24th International Conference Image

Fig. 10. Experiment of corridor detection when the ego-vehicle is changinglanes (from left to right, top to bottom).

Based on a constraint equation in [19], a closed-form

solution for scenes with a planar surface is discussed in [38].

Furthermore, the estimation of ego-motion parameters is based

on two monocular sequences in [59], a pair of images from

a stereo rig in [45], three frames in [48], or stereo sequences

in [46]. In [47], a direct method based motion estimation for

driver assistance is discussed, and robust results are shown.

[47] and [39] pointed out that in typical traffic scenarios, the

direct method performs better than any (tested) optic flow-

based method. Still, ego-motion estimation based on the direct

method is not yet at the state of being robust (i.e., for varying

situations), or practical (e.g., real time). GPS data [33] may

also be further fused with ego-motion data in the future.

IX. PEDESTRIAN DETECTION

With the event of DAS, interest in pedestrian detection

systems was growing fast. Nevertheless, pedestrian detection

and tracking is still a challenging problem in ordinary driving

situations. The problem only gets worse as pedestrian detection

should perform robustly for all types of situations, such as un-

der variable illumination conditions, variable rotated positions

and pose (including crouching), and even if pedestrians are

partially occluded.

Since [55] proposed a boosted cascade of Haar-like wavelet

features for face detection, many authors applied this machine

learning approach for object and pedestrian detection. [8] used

this classification technique to detect pedestrians in single

images with an extended pool of Haar-like wavelet features.

[56] further improved previous work by integrating intensity

and motion information for walking person detection, [24]

extended this work further by using many more frames as

input to the pedestrian detector (in surveillance scenarios). [54]

uses local differences of histograms of oriented gradients in

combination with a boosting algorithm to detect pedestrians.

[9] uses 3D Haar-like features by defining seven types of

volume filters to be able to represent a pedestrian’s appearance,

but also to capture the motion information.

[29] fused detection and space-time trajectory estimation

into a coupled optimization problem. It is shown that the

detection and estimation procedure can be formulated as

individual quadratic boolean problem, which can be combined

into a coupled optimization problem. The resulting hypothesis

selection procedure always searches for the best explanation

of the current image and all previous observations.

A multi-cue vision system for real-time detection and track-

ing of pedestrians, called PROTECTOR [15], was developed in

collaboration with Daimler AG. The detection system involves

a cascade of modules to decrease the image search space. After

a stereo based ROI (region of interest) generation, a shape-

based detection technique is used with trained templates using

the Daimler Pedestrian Detection Benchmark Data Set [14].

In our pedestrian detection system we apply the scale-invariant feature transform (SIFT) of [30], [31]. The original

SIFT-based object detection scheme is, however, not suitable

for the problem of pedestrian detection. Motivated by [44], we

divide a pedestrian’s body into sub-regions, and use indepen-

dent classifiers for the simple reason that a single classifier

has only to learn individual local features of regions.

After estimating the position in 3D space, we apply a linear

multi-model Kalman filter to track the pedestrian’s velocity

and position. This was crucial in increasing the accuracy of

our detection. In contrast to other works, we benefit from the

original SIFT approach by matching key points for the object

of interest over several time steps, and, hopefully, gain a high

accuracy of a pedestrian’s lateral movement and reduce thus

the influences of a moving camera (e.g., egomotion or blur).

X. CONCLUSIONS

This paper highlights current projects and challenges in the

.enpeda.. project. Motion or stereo analysis define the basic

steps of generating supporting low-level data for obtaining

proper interpretations of traffic environments. Due to the

mentioned need of developing active safety systems for cars,

vision-based DAS is the future.

REFERENCES

[1] Aujol, J. F., Gilboa, G., Chan, T., and Osher, S.: Structure-texture imagedecomposition - modeling, algorithms, and parameter selection. Int. J.Computer Vision, 67:111-136 (2006)

[2] Badino, H.: A robust approach for ego-motion estimation using a mobilestereo platform. In Proc. Int. Workshop Complex Motion, pages 198–208(2007)

[3] Badino, H., Franke, U., and Mester, R.: Free space computation usingstochastic occupancy grids and dynamic programming. In Proc. Work-shop on Dynamical Vision, IEEE Int. Conf. Computer Vision (2007)

[4] Badino, H., Mester, R., Vaudrey, T., and Franke, U.: Stereo-based freespace computation in complex traffic scenarios. In Proc. IEEE SouthwestSymp. Image Analysis Interpretation, pages 189–192 (2008)

[5] Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., andSzeliski, R.: A database and evaluation methodology for optical flow. InProc. IEEE Int. Conf. Computer Vision, pages 1–8 (2007)

[6] Brox, T., Bruhn, A., Papenberg, N., and Weickert, J.: High accuracyoptical flow estimation based on a theory for warping. In Proc. EuropeanConf. Computer Vision (ECCV), pages 25–36 (2004)

[7] Bundler: Structure from motion for unordered image collections. http://phototour.cs/washington.edu/bundler/, (2009)

[8] Chen, Y., and Chen, C.: A cascade of feed-forward classifiers for fastpedestrian detection. In Proc. Asian Conf. Computer Vision, pages 905–914 (2007)

[9] Cui, X., Liu, Y., Shan, S., Chen, X., and Gao, W.: 3D Haar-like featuresfor pedestrian detection. In Proc. IEEE Int. Conf. Multimedia Expo,pages 1263–1266 (2007)

24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)

- 134 -

Page 6: [IEEE 2009 24th International Conference Image and Vision Computing New Zealand (IVCNZ) - Wellington, New Zealand (2009.11.23-2009.11.25)] 2009 24th International Conference Image

[10] Domke, J., and Aloimonos, Y.: A probabilistic notion of correspondenceand the epipolar constraint. In Proc. Int. Symp. 3D Data ProcessingVisualization Transmission, pages 41–48 (2006)

[11] EISATS (.enpeda.. Image Sequence Analysis Test Site), see http://www.mi.auckland.ac.nz/EISATS (2009)

[12] Felzenszwalb, P. F., and Huttenlocher, D.P.: Efficient belief propagationfor early vision. Int. J. Computer Vision, 70:41–54 (2006)

[13] Franke, U., Rabe, C., Badino, H., and Gehrig, S. 6D-vision: fusion ofstereo and motion for robust environment perception. In Proc. PatternRecognition - DAGM, pages 216–223 (2005)

[14] Gavrilla, D.: Daimler pedestrian benchmark data set. Follow ‘Lookingat people’ on http://www.gavrila.net/Research/research.html (2009)

[15] Gavrilla, D.M., and Munder., S.: Multi-cue pedestrian detection andtracking from a moving vehicle. Int. J. Computer Vision, 73:41–59(2007)

[16] Gimel’farb, G. L.: Probabilistic regularisation and symmetry in binoculardynamic programming stereo. Pattern Recognition Letters, 23:431–442(2002)

[17] Hirschmuller, H.: Stereo processing by semiglobal matching and mu-tual information. IEEE Trans. Pattern Analysis Machine Intelligence,30:328–341 (2008)

[18] Hirschmuller, H., and Scharstein, D.: Evaluation of stereo matching costson images with radiometric differences. IEEE Trans. Pattern AnalysisMachine Intelligence, to appear (2009)

[19] Horn, B.K.P., and Weldon, Jr., E. J.: Direct methods for recoveringmotion. Int. J. Computer Vision, 2:51–76 (1988)

[20] Huguet, F., and Devernay, F.: A variational method for scene flowestimation from stereo sequences. In Proc. IEEE Int. Conf. ComputerVision, pages 1–7 (2007)

[21] Jawed, K., Morris, J., Khan, T., and Gimel’farb, G.: Real time rectifica-tion for stereo correspondence. In Proc. IEEE/IFIP Int. Conf. EmbeddedUbiquitous Computing, to appear (2009)

[22] Jiang, R., Klette, R., Wang, S., and Vaudrey, T.: New lane model anddistance transform for lane detection and tracking. In Proc. Int. Conf.Computer Analysis Images Patterns, to appear (2009)

[23] Jiang, R., Klette, R., Wang, S., and Vaudrey, T.: Ego-vehicle corridorsfor vision-based driver assistance. In Proc. Int. Workshop CombinatorialImage Analysis, to appear (2009)

[24] Jones, M.J., and Snow, D.: Pedestrian detection using boosted featuresover many frames. In Proc. Int. Conf. Pattern Recognition, pages 1–4(2008)

[25] Kajiya, J.T.: The rendering equation. In Proc. SIGGRAPH, pages 143–150 (1986)

[26] Khan, T., Morris, J., and Jawed, K.: Intelligent vision for mobile agents:Contour maps in real time. Submitted paper (2009)

[27] Klette, R., Jiang, R., Morales, S., and Vaudrey, T.: Discrete driverassistance. In Proc. ISMM (M.H.F. Wilkinson and J.B.T.M. Roerdink,eds.), LNCS 5720, pages 1–12, Springer, Berlin (2009)

[28] Lafortune, E.: Mathematical models and Monte Carlo algorithms forphysically based rendering. Dept. Computer Science, Faculty of Engi-neering, Katholieke Universiteit Leuven (1996)

[29] Leibe, B., Schindler, K., Cornelis, N., and Van Gool, L.: Coupled objectdetection and tracking from static cameras and moving vehicles. IEEETrans. Pattern Analysis Machine Intelligence, 30:1683–1698 (2008)

[30] Lowe, D.: Object recognition from local scale-invariant features. In Proc.IEEE Int. Conf. Computer Vision, pages 1150–1157 (1999)

[31] Lowe, D.: Distinctive image features from scale-invariant keypoints. Int.J. Computer Vision, 60:91–110 (2004)

[32] LuxRender. See http://www.luxrender.net/ (2009)[33] Manoharan, S.: On GPS tracking of mobile devices. In Proc. IEEE Int.

Conf. Networking Services, pages 415–418 (2009)[34] McCall, J.C., and Trivedi, M.M.: Video-based lane estimation and

tracking for driver assistance: survey, system, and evaluation. IEEETrans. Intelligent Transportation System, 7:20–37 (2006)

[35] Morales, S., and Klette, R.: A third eye for performance evaluation instereo sequence analysis In Proc. Int. Conf. Computer Analysis ImagesPattens ,to appear (2009)

[36] Morales, S., Woo, Y. W., Klette, R., and Vaudrey, T.: A study on stereoand motion data accuracy for a moving platform. In Proc. FIRA WorldCongress, to appear (2009)

[37] Morris, J., Jawed, K., and Gimel’farb, G.: Intelligent vision: A first step- real time stereo vision. In Proc. ACIVS, to appear (2009)

[38] Negahdaripour, S., and Horn, B.K.P.: Direct passive navigation. IEEETrans. Pattern Analysis Machine Intelligence, 1: 168–176 (1987)

[39] Qifa, K.: Transforming camera geometry to a virtual downward-lookingcamera: robust ego-motion estimation and ground-layer detection. InProc. IEEE Conf. Computer Vision Pattern Recognition, pages 390–397(2003)

[40] Park, S., and Jeong, H.: A high-speed parallel architecture for stereomatching. In Proc. ISVC, LNCS 4291, pages 334–342 (2006)

[41] Patras, I., Hendriks, E., and Tziritas, G.: A joint motion/disparityestimation method for the construction of stereo interpolated images instereoscopic image sequences. In Proc. Annual Conf. Advanced SchoolComputing Imaging (1997)

[42] Scharstein D.: Prediction Error as a Quality Metric for Motion andStereo. In Proc. IEEE Int. Conf. Computer Vision, pages 781–788 (1999)

[43] Scharstein, D., and Szeliski, R.: A taxonomy and evaluation of densetwo-frame stereo correspondence algorithms. Int. J. Computer Vision,47:7–42 (2002)

[44] Shashua, A., Gdalyahu, Y., and Hayon., G.: Pedestrian detection fordriving assistance systems: Single-frame classification and system levelperformance. In Proc. IEEE Intelligent Vehicles Symp. , pages 1–6 (2004)

[45] Shieh, J.Y., Zhuang, H., and Sudhakar, R.: Motion estimation from asequence of stereo images: a direct method. IEEE Trans. System ManCybernetics, 24:1044–1053 (1994)

[46] Stein, G.P.: Geometric and photometric constraints : motion and structurefrom three views. PhD-Thesis. Mass. Inst. Technology, Dept. ElectricalEngineering Computer Science (1998)

[47] Stein, G.P., Mano, O., and Shashua, A.: A robust method for computingvehicle ego-motion. In Proc. IEEE Intelligent Vehicle Symp., pages 362–368 (2000)

[48] Stein, G.P., and Shashua, A.: Model based brightness constraints: ondirect estimation of structure and motion. In Proc. IEEE Conf. ComputerVision Pattern Recognition, pages 992–1015 (1997)

[49] Thrun, S., Montemerlo, M., and Aron, A.: Probabilistic terrain analysisfor high-speed desert driving. In Proc. Robotics Science Systems Conf.(2006)

[50] Shi, J., and Tomasi, C.: Good features to track. In Proc. IEEE Conf.Computer Vision Pattern Recognition, pages 593–600 (1994)

[51] Tsao, A.T., Fuh, C.S., Hung, Y.P., and Chen, Y.S.: Ego-motion estima-tion using optical flow fields observed from multiple cameras. In Proc.Computer Vision Pattern Recognition, pages 457–462 (1997)

[52] Vaudrey, T., and Klette, R.: Residual images remove illumination ar-tifacts for correspondence algorithms! In Proc. Pattern Recognition -DAGM, pages 472–481 (2009)

[53] Vaudrey, T., Wedel, A., Chen, C.-Y., and Klette, R.: Improving opticalflow using residual images. In Proc. Int. Conf. Arts Information Tech-nology, to appear (2009)

[54] Villamizar, M., Sanfeliu, A., and Andrade-Cetto, J.: Local boosted fea-tures for pedestrian detection. In Proc. Iberian Conf. Pattern RecognitionImage Analysis, pages 128–135 (2009)

[55] Viola, P., and Jones, M.: Rapid object detection using a boosted cascadeof simple features. In Proc. Conf. Computer Vision Pattern Recognition,pages 511–518 (2001)

[56] Viola, P., Jones, M., and Snow, D.: Detecting pedestrians using patternsof motion and appearance. In Proc. IEEE Int. Conf. Computer Vision,volume 2, pages 734–741 (2003)

[57] Wedel, A., Pock, T., Zach, C., Bischof, H., and Cremers, D.: Animproved algorithm for TV-L1 optical flow. Visual Motion Analysis (D.Cremers, B. Rosenhahn, and A. Yuille, eds.), pages 23–45 (2009)

[58] Wedel, A., Rabe, C., Vaudrey, T., Brox, T., Franke, U., and Cremers,D.: Efficient dense scene flow from sparse or dense stereo data. In Proc.European Conf. Computer Vision, pages 739–751 (2008)

[59] Weng, J., and Huang, T. S.: Complete structure and motion from twomonocular sequences without stereo correspondence. In Proc. Int. Conf.Pattern Recognition, pages 651–654 (1992)

[60] Zhang, Y., and Kambhamettu, C.: On 3d scene flow and structureestimation. In Proc. IEEE Conf. Computer Vision Pattern Recognition,pages 778–785 (2001)

24th International Conference Image and Vision Computing New Zealand (IVCNZ 2009)

- 135 -