object segmentation and modelling through robotic manipulation and observing bilateral symmetry

Object Segmentation and Modelling Through Robotic

Manipulation and Observing Bilateral Symmetry

by

Wai Ho Li

BE(HONs) in Computer Systems Engineering (2004)

Monash University, Australia

Thesis

Submitted by Wai Ho Li

for fulfillment of the Requirements for the Degree of

Doctor of Philosophy

Supervisor: Associate Professor Lindsay Kleeman

Associate Supervisor: Associate Professor R. Andrew Russell

Intelligent Robotics Research Centre

Department of Electrical and Computer Systems Engineering

Monash University, Australia

December, 2008

In memory of my father



Declaration

I hereby declare that this submission is my own work and that, to the best of my knowledgeand belief, it contains no material previously published or written by another person normaterial which has been accepted for the award of any other degree or diploma of theuniversity or other institute of higher learning, except where due acknowledgment has beenmade in the text. Similarly, software and hardware systems described in this submissionhave been designed and implemented without external help unless otherwise stated in thetext.

Wai Ho LiDecember 3, 2008

c© Copyright

by

Wai Ho Li

2008



Wai Ho [email protected]

Intelligent Robotics Research CentreDepartment of Electrical and Computer Systems Engineering

Monash University, Australia, 2008

Supervisor: Associate Professor Lindsay [email protected]

Associate Supervisor: Associate Professor R. Andrew [email protected]

Abstract

Robots are slowly making their way into our lives. Over two million Roomba robots havebeen purchased by consumers to vacuum their homes. With the aging populations of theUSA, Europe and Japan, demand for domestic robots will inevitably increase. Everydaytasks such as setting the dining table require a robot that can deal with household objectsreliably. The research in this thesis develops the visual sensing, object manipulation andthe autonomy necessary for a robot to deal with household objects intelligently.

As many household objects are visually symmetric, it is worth exploring bilateral sym-metry as an object feature. Existing methods of detecting bilateral symmetry have highcomputational cost or are sensitive to noise, making them unsuitable for robotic appli-cations. This thesis presents a novel detection method targeted specifically at real timerobotic applications. The detection method is able to rapidly detect the symmetries ofmulti-colour, transparent and reflective objects.

The fast symmetry detector is applied to two static visual sensing problems. Firstly,detected symmetry is used to guide object segmentation. Segmentation is performedby identifying near-symmetric edge contours using a dynamic programming approach.Secondly, three dimensional symmetry axes are found by triangulating pairs of symmetrylines in stereo. Symmetry axes are used to localize objects on a table and are especiallyuseful when dealing with surface of revolution objects such as cups and bottles.

The symmetry detector is also applied to the dynamic problem of real time object tracking.The tracker contains a Kalman filter that uses object motion and symmetry synergeticallyto track an object in real time. An extensive quantitative analysis of the tracking error isperformed. By using a pendulum to generate predictable object trajectories, the tracking

error is measured against reliable ground truth data. The performance of colour andsymmetry as tracking features are also compared qualitatively.

Using the newly developed visual symmetry toolkit, an autonomous robotic system isimplemented. This began with giving the robot the ability to autonomously segmentnew objects. The robot performs segmentation by applying a gentle nudge to an objectand analysing the induced motion. The robot is able to robustly and accurately segmentobjects, including transparent objects, against cluttered backgrounds. The segmentationprocess is performed without any human guidance.

Finally, the robot’s newfound segmentation ability is leveraged to perform autonomousobject learning. After performing object segmentation, the robot grasps and rotates thesegmented object to gather training images. These robot-collected images are used toproduce reusable object models that are collated into an object recognition database. Es-sentially, the robot learns new symmetric objects through physical interaction. Given thatmost households contain too many unique objects to model exhaustively, the autonomouslearning approach shifts the burden of object model construction from the human user tothe tireless robot.

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Key Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3.1 Chapter 2: Background Information . . . . . . . . . . . . . . . . . . 10

1.3.2 Chapter 3: Symmetry Detection . . . . . . . . . . . . . . . . . . . . 10

1.3.3 Chapter 4: Sensing Objects in Static Scenes . . . . . . . . . . . . . . 10

1.3.4 Chapter 5: Real Time Object Tracking . . . . . . . . . . . . . . . . 11

1.3.5 Chapter 6: Autonomous Object Segmentation . . . . . . . . . . . . . 11

1.3.6 Chapter 7: Object Learning by Robotic Interaction . . . . . . . . . . 11

1.3.7 Chapter 8 : Conclusion and Future Work . . . . . . . . . . . . . . . 12

1.3.8 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.9 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Background Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 Visual Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.1 Symmetry in Human Visual Processing . . . . . . . . . . . . . . . . 13

2.1.2 Types of Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Related Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Accurate Shape Representation . . . . . . . . . . . . . . . . . . . . . 15

2.2.2 Towards Symmetry Detection . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.4 Skew Symmetry Detection . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.5 Perceptual Organization . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.6 Multi-Scale Detection Approaches . . . . . . . . . . . . . . . . . . . 20

2.2.7 Applications of Detection . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.8 SIFT-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Symmetry Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Fast Bilateral Symmetry Detection . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.1 Novel Aspects of Detection Method . . . . . . . . . . . . . . . . . . 25

3.2.2 Algorithm Description . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.3 Extensions to Detection Method . . . . . . . . . . . . . . . . . . . . 31

3.3 Computational Complexity of Detection . . . . . . . . . . . . . . . . . . . . 35

3.4 Detection Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4.1 Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.2 Video Frames of Indoor Scenes . . . . . . . . . . . . . . . . . . . . . 37

3.4.3 Computational Performance . . . . . . . . . . . . . . . . . . . . . . . 46

3.5 Comparison with Generalized Symmetry Transform . . . . . . . . . . . . . . 48

3.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.5.2 Comparison of Detection Results . . . . . . . . . . . . . . . . . . . . 48

3.5.3 Comparison of Computational Performance . . . . . . . . . . . . . . 52

3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Sensing Objects in Static Scenes . . . . . . . . . . . . . . . . . . . . . . . . 55

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2 Monocular Object Segmentation Using Symmetry . . . . . . . . . . . . . . 56

4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.2.2 The Symmetric Edge Pair Transform . . . . . . . . . . . . . . . . . . 57

4.2.3 Dynamic Programming and Contour Refinement . . . . . . . . . . . 59

4.2.4 Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.5 Computational Performance . . . . . . . . . . . . . . . . . . . . . . . 65

4.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.3 Stereo Triangulation of Symmetric Objects . . . . . . . . . . . . . . . . . . 67

4.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.3.2 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.3.3 Triangulating Pairs of Symmetry Lines . . . . . . . . . . . . . . . . . 69

4.3.4 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . 72

4.3.5 Accuracy of Symmetry Triangulation . . . . . . . . . . . . . . . . . . 75

4.3.6 Qualitative Comparison with Dense Stereo . . . . . . . . . . . . . . 77

4.3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5 Real Time Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2 Real Time Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.2.1 Improving the Quality of Detected Symmetry . . . . . . . . . . . . . 85

5.2.2 Block Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.2.3 Object Segmentation and Motion Mask Refinement . . . . . . . . . 89

5.2.4 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.3 Object Tracking Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.3.1 Discussion of Tracking Results . . . . . . . . . . . . . . . . . . . . . 96

5.3.2 Real Time Performance . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.4 Bilateral Symmetry as a Tracking Feature . . . . . . . . . . . . . . . . . . . 99

5.4.1 Obtaining Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . 100

5.4.2 Quantitative Analysis of Tracking Accuracy . . . . . . . . . . . . . . 105

5.4.3 Qualitative Comparison Between Symmetry and Colour . . . . . . . 114

5.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6 Autonomous Object Segmentation . . . . . . . . . . . . . . . . . . . . . . 121

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.1.3 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.1.4 System Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.2 Detecting Interesting Locations . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.2.1 Collecting Symmetry Intersects . . . . . . . . . . . . . . . . . . . . . 129

6.2.2 Clustering Symmetry Intersects . . . . . . . . . . . . . . . . . . . . . 129

6.3 The Robotic Nudge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

6.3.1 Motion Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

6.3.2 Obtaining Visual Feedback by Stereo Tracking . . . . . . . . . . . . 133

6.4 Object Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.5 Autonomous Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . 136

6.5.1 Cups Without Handles . . . . . . . . . . . . . . . . . . . . . . . . . . 136

6.5.2 Mugs With Handles . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.5.3 Beverage Bottles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6.6 Discussion and Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . 139

7 Object Learning by Interaction . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.2 Autonomous Object Grasping After a Robotic Nudge . . . . . . . . . . . . 146

7.2.1 Robot Gripper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

7.2.2 Determining the Height of a Nudged Object . . . . . . . . . . . . . . 148

7.2.3 Object Grasping, Rotation and Training Data Collection . . . . . . 150

7.3 Modelling Object using SIFT Descriptors . . . . . . . . . . . . . . . . . . . 151

7.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7.3.2 SIFT Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

7.3.3 Removing Background SIFT Descriptors . . . . . . . . . . . . . . . . 152

7.3.4 Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

7.4 Autonomous Object Learning Experiments . . . . . . . . . . . . . . . . . . 157

7.4.1 Object Recognition Results . . . . . . . . . . . . . . . . . . . . . . . 157

7.5 Discussion and Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . 160

8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 169

8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

8.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Appendix A Multimedia DVD Contents . . . . . . . . . . . . . . . . . . . . 177

A.1 Real Time Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

A.2 Autonomous Object Segmentation . . . . . . . . . . . . . . . . . . . . . . . 177

A.3 Object Learning by Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 178

Appendix B Building a New Controller for the PUMA 260 . . . . . . . . 179

B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

B.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

B.3 Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

B.3.1 PUMA 260 Physical Parameters . . . . . . . . . . . . . . . . . . . . 181

B.3.2 Direct Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

B.3.3 Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

List of Figures

2.1 Bilateral and skew symmetry. . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Rotational and radial symmetry. . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Research related to bilateral symmetry detection. . . . . . . . . . . . . . . . 22

3.1 Fast symmetry – Convergent voting. . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Fast symmetry – Edge pixel rotation and grouping. . . . . . . . . . . . . . . 29

3.3 Fast skew symmetry detection. . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4 Unwanted symmetry detected from a horizontal line. . . . . . . . . . . . . . 33

3.5 Symmetry detection results – Synthetic images. . . . . . . . . . . . . . . . . 38

3.6 Symmetry detection result – Single symmetric object. . . . . . . . . . . . . 39

3.7 Symmetry detection result – Multiple symmetric objects. . . . . . . . . . . 41

3.8 Detecting non-object symmetry lines. . . . . . . . . . . . . . . . . . . . . . . 42

3.9 Rejecting unwanted symmetry with angle limits. . . . . . . . . . . . . . . . 44

3.10 Symmetry detection results – Challenging objects. . . . . . . . . . . . . . . 45

3.11 Detection execution time versus angle range. . . . . . . . . . . . . . . . . . 47

3.12 Fast symmetry versus gen. symmetry – Test image 1. . . . . . . . . . . . . 49

3.13 Fast symmetry versus gen. symmetry – Test image 2. . . . . . . . . . . . . 51

4.1 Overview of object segmentation steps. . . . . . . . . . . . . . . . . . . . . . 60

4.2 Object contour detection and contour refinement. . . . . . . . . . . . . . . . 63

4.3 Segmentation of a multi-colour mug. . . . . . . . . . . . . . . . . . . . . . . 64

4.4 Object segmentation on a scene with multiple objects. . . . . . . . . . . . . 65

4.5 Stereo vision hardware setup. . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.6 Test objects used in the triangulation experiments. . . . . . . . . . . . . . . 73

4.7 Example stereo data set – Multi-colour mug. . . . . . . . . . . . . . . . . . 74

4.8 Triangulation results for reflective metal can. . . . . . . . . . . . . . . . . . 75

4.9 Dense stereo disparity result – Textured bottle. . . . . . . . . . . . . . . . . 78

4.10 Dense stereo disparity result – Transparent bottle. . . . . . . . . . . . . . . 79

4.11 Dense stereo disparity result – Reflective can. . . . . . . . . . . . . . . . . . 80

5.1 System diagram of real time object tracker. . . . . . . . . . . . . . . . . . . 84

5.2 Using angle limits to reject non-object symmetry. . . . . . . . . . . . . . . . 86

5.3 Motion mask object segmentation – White bottle. . . . . . . . . . . . . . . 90

5.4 Motion mask object segmentation – White cup. . . . . . . . . . . . . . . . . 91

5.5 Generating rotated bounding boxes – Transparent bottle. . . . . . . . . . . 94

5.6 Generating rotated bounding boxes – Multi-colour mug. . . . . . . . . . . . 95

5.7 Pendulum hardware. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.8 Automatically extracted symmetry line. . . . . . . . . . . . . . . . . . . . . 102

5.9 Pendulum video – White background. . . . . . . . . . . . . . . . . . . . . . 103

5.10 Pendulum video – Red background. . . . . . . . . . . . . . . . . . . . . . . . 103

5.11 Pendulum video – Edge background. . . . . . . . . . . . . . . . . . . . . . . 104

5.12 Pendulum video – Mixed background. . . . . . . . . . . . . . . . . . . . . . 104

5.13 Example of symmetry detection under edge noise. . . . . . . . . . . . . . . . 108

5.14 White background – Sym. tracking error plots. . . . . . . . . . . . . . . . . 109

5.15 Red background – Sym. tracking error plots. . . . . . . . . . . . . . . . . . 110

5.16 Edge background – Sym. tracking error plots. . . . . . . . . . . . . . . . . . 111

5.17 Mixed background – Sym. tracking error plots. . . . . . . . . . . . . . . . . 112

5.18 White background – Histograms of sym. tracking errors. . . . . . . . . . . . 113

5.19 Red background – Histograms of sym. tracking errors. . . . . . . . . . . . . 113

5.20 Edge background – Histograms of sym. tracking errors. . . . . . . . . . . . 113

5.21 Mixed background – Histograms of sym. tracking errors. . . . . . . . . . . . 113

5.22 Hue-saturation histogram back projection. . . . . . . . . . . . . . . . . . . . 115

5.23 Effects of different backgrounds on colour centroid. . . . . . . . . . . . . . . 116

5.24 White background – Colour tracking error plot. . . . . . . . . . . . . . . . . 118

5.25 Red background – Colour tracking error plot. . . . . . . . . . . . . . . . . . 118

5.26 Edge background – Colour tracking error plot. . . . . . . . . . . . . . . . . 119

5.27 Mixed background – Colour tracking error plot. . . . . . . . . . . . . . . . . 119

6.1 Robotic system hardware components. . . . . . . . . . . . . . . . . . . . . 126

6.2 Autonomous object segmentation flowchart. . . . . . . . . . . . . . . . . . 127

6.3 The robotic nudge – Side view. . . . . . . . . . . . . . . . . . . . . . . . . . 131

6.4 The robotic nudge – Overhead view. . . . . . . . . . . . . . . . . . . . . . 131

6.5 Video images of robotic nudge. . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.6 Workspace visualization of robotic nudge. . . . . . . . . . . . . . . . . . . . 133

6.7 Motion segmentation using symmetry. . . . . . . . . . . . . . . . . . . . . . 135

6.8 Autonomous segmentation results – Cups. . . . . . . . . . . . . . . . . . . . 137

6.9 Autonomous segmentation results – Mugs with handles. . . . . . . . . . . . 138

6.10 Autonomous segmentation results – Beverage bottles. . . . . . . . . . . . . 140

6.11 Autonomous segmentation results – Beverage bottles (continued). . . . . . . 141

7.1 Robot gripper and angle bracket. . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2 Photos of robot gripper. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.3 Detecting the top of a nudged object. . . . . . . . . . . . . . . . . . . . . . 148

7.4 Uncertainty of stereo triangulated object height. . . . . . . . . . . . . . . . 149

7.5 Autonomously collected training data set – Green bottle. . . . . . . . . . . 150

7.6 SIFT detection example – White bottle training image. . . . . . . . . . . . 153

7.7 Removing background SIFT descriptors. . . . . . . . . . . . . . . . . . . . . 154

7.8 Object recognition using learned SIFT descriptors. . . . . . . . . . . . . . . 156

7.9 Bottles used in object learning and recognition experiments. . . . . . . . . . 158

7.10 Object recognition result – White bottle (match01.png). . . . . . . . . . . 161

7.11 Object recognition result – Yellow bottle (match02.png). . . . . . . . . . . 162

7.12 Object recognition result – Green bottle (match02.png). . . . . . . . . . . 163

7.13 Object recognition result – Brown bottle (match03.png). . . . . . . . . . . 164

7.14 Object recognition result – Glass bottle (match03.png). . . . . . . . . . . . 165

7.15 Object recognition result – Cola bottle (match03.png). . . . . . . . . . . . 166

7.16 Object recognition result – Transparent bottle (match02.png). . . . . . . . 167

B.1 Overview of new robot arm controller. . . . . . . . . . . . . . . . . . . . . . 180

B.2 New stand-alone controller for the PUMA 260. . . . . . . . . . . . . . . . . 180

List of Tables

3.1 Execution time of fast bilateral symmetry detection . . . . . . . . . . . . . 46

3.2 Execution time of generalized symmetry on test images . . . . . . . . . . . 52

3.3 Execution time of fast symmetry on test images . . . . . . . . . . . . . . . . 52

4.1 Execution time of object segmentation . . . . . . . . . . . . . . . . . . . . . 66

4.2 Triangulation error at checkerboard corners . . . . . . . . . . . . . . . . . . 76

5.1 Object tracker execution times and frame rate . . . . . . . . . . . . . . . . . 99

5.2 Pendulum ground truth data regression residuals . . . . . . . . . . . . . . . 102

5.3 Pendulum symmetry tracking error statistics . . . . . . . . . . . . . . . . . 106

7.1 Object recognition results – SIFT descriptor matches . . . . . . . . . . . . . 159

B.1 PUMA 260 link and joint parameters . . . . . . . . . . . . . . . . . . . . . . 182

List of Algorithms

1 Fast bilateral symmetry detection . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Symmetric edge pair transform (SEPT) . . . . . . . . . . . . . . . . . . . . . 58

3 Score table generation through dynamic programming . . . . . . . . . . . . . 61

4 Contour detection by backtracking . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Block motion detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Acknowledgments

Marston Bates said that Research is the process of going up alleys to see if they are blind.Many thanks to my supervisor Lindsay Kleeman, who steered my research away fromthe blind alleys with a constant supply of practical advice and constructive criticism.I would like to thank my associate supervisor Andy Russell for invaluable discussionsabout robotics and biological organisms. Thanks goes to Ray Jarvis, who regularly sharedhis deep insights on robotics research. During our travels to conferences near and far,Lindsay, Andy and Ray were constant sources of interesting stories spanning robotics,biology, philosophy and comedy.

I was fortunate enough to study at the Intelligent Robotics Research Centre, which isfull of helpful and competent people. Geoff Taylor eased my transition into postgraduatestudies by answering numerous questions covering both theory and practice. In the lab,Albert Diosi constantly humbled and inspired me with his tireless work ethic. Alan M.Zhang was always available for lengthy discussions about robotics and algorithm design.Konrad Schindler was ever willing to share his encyclopedic knowledge of computer visionresearch. Steve Armstrong lend his time and expertise to help debug hardware issues.Nghia, Damien, Dennis, Jay and a long list of others were always available for lunch,which included obligatory newspaper quizzes and freeform current events discussions. Iam grateful to be amongst such friends.

Without the support of my family, my PhD studies and the completion of this dissertationwould be impossible. To my sister, Joey, thank you for reminding me that rest is animportant reprieve from thesis writing. To my parents, Peter and Mary, thank you fornurturing my curiosity. My achievements, past, present and future, are a product of yourhardwork.

The research in this thesis was supported by an Australian Postgraduate Award and theIntelligent Robotics Research Centre at Monash University. Funding for conference travelwas also provided by the Monash Research Graduate School and the Department of Electri-cal and Computer System Engineering. I gratefully acknowledge these sources of financialsupport.

Finally, I sincerely thank everyone that offered their help and condolences after theylearned about my father’s death. You made finishing this dissertation in the wake oftragedy much easier.

And as I look at the trends that are now starting to con-verge, I can envision a future in which robotic deviceswill become a nearly ubiquitous part of our day-to-daylives.

Bill Gates, January 2007 1Introduction

It seems that all the right doors are opening for intelligent robots to make their entryinto the household. This chapter’s quote comes from Bill Gates’ A Robot in Every Homearticle in Scientific American magazine, where he suggests that the availability of low costcomputational power and better sensors are clearing the way for affordable and diversedomestic robotic systems. A European survey of public opinion [Ray et al., 2008] alsoobserves strong consumer interest in domestic robots. The survey indicates a generallypositive attitude towards domestic robots, especially those that will help alleviate thetedium of repetitive daily tasks such as vacuum cleaning, setting the table and cleaningthe dishes.

Roboticists are also noticing the emerging consumer demand for domestic robots. Ina 2008 workshop on Robot Services in Aging Societies [Buss et al., 2008], prominentrobotics researchers discussed the role of technology in the face of the developed world’saging societies. According to World Bank statistics presented by Henrik Christensen[Christensen, 2008], the current ratio of 0.2 retired versus working individuals in the USAwill increase to 0.45 by 2040. This problem is worse in Europe and worst in Japan, with thelatter having an expected ratio of retirees versus workers of 0.6 in 2040. During the closingdiscussions, a Japanese researcher also mentioned an industrial estimate of a US$23 billiondomestic robotics market in Japan by 2030. Additionally, the workshop highlighted theneed for robotic systems that can compensate for the physical and mental limitations ofthe elderly and the disabled, such as helping a person dress in the morning and remindingthe user if they forget to take their medicine.

The aforementioned robotics workshop [Buss et al., 2008] also touched on the forms thatfuture domestic robots will take. Unlike the current crop of mobile robots designed tovacuum the house or mow the lawn, the next generation of domestic robots will likelyincorporate robot manipulators. For example, [Christensen, 2008] shows a robot arm at-tachment for a motorized wheelchair that can grasp common household objects. However,many technological obstacles stand in the way of large scale consumer adoption of suchdomestic robots. One of the most challenging, is the ability to deal with household objectsreliably, including objects for which the robot has no prior models. For example, taskssuch as cleaning dishes and fetching a beverage demand the robust sensing and reliable

1

CHAPTER 1. INTRODUCTION

manipulation of objects, including new objects that the robot has never encountered be-fore. The research presented in this thesis develops the visual sensing methods, objectmanipulation techniques and autonomy needed by domestic robots to deal with commonhousehold objects such as cups and bottles.

1.1 Motivation and Challenges

A domestic robot that can perform object interaction with the same grace and agilityas a person will leave roboticists in awe. The simple task of fetching a drink from atable highlights the complex sensing and manipulations humans make on a regular basis.The following is an example of the steps a robot will take in order to perform the samedrink fetching task. Adhering to the popular Sense-Plan-Act paradigm, the robot beginsby recognizing the target beverage amongst the objects on the table. After localizingthe beverage in three dimensions, the robot generates an action plan for grasping theobject. The planning will need to take into account the robot arm’s workspace and thelocations of obstacles in the environment. Physically stable locations on the object wherethe robot’s fingers can be placed to ensure a sturdy grasp must also be calculated duringplanning. Finally, the robot will perform the grasp by positioning its gripper around thetarget object. This usually requires inverse kinematics calculations to find robot arm jointangles or close-looped servoing of the end effector. In general, the sensors external to therobot arm are also monitored during the grasp to assess whether the object manipulationis carried successfully. Possible problems such as the spillage of the beverage container’scontents, temporal changes in the environment and occlusions of the target beverage havebeen ignored in the above scenario.

The complicated series of steps the robot must take to perform the drink fetching taskillustrates a fundamental challenge of robotics. The challenge of robotic naivety. A robotknows little about anything. Robotic systems rely on human-provided knowledge to in-terpret the sensor data they receive during online operation. For example, a mobile robotcan localize itself by comparing laser range finder readings with a metric map of theenvironment. More subtly, a robot performing Simultaneous Localization and Mapping(SLAM) relies on human-provided algorithms and constraints to build a map, to removeinconsistencies and to close the loop. Similarly, a robot relies on human guidance in theform of pre-programmed actions or kinematics algorithms to move its actuators in order toperform useful actions. A major challenge addressed by this thesis is the minimization ofthe a priori knowledge the robot requires while maximizing its flexibility and robustnesswhen performing practical domestic tasks.

Robots that deal with graspable household objects generally use vision as their primarysensing modality. A video camera provides spatially dense information at high refreshrates, which is ideal for cluttered indoor scenes such as the household. Also, camerascost less than active ranger finders and are able to operate at short distances where time-of-flight sensors are unable to operate. A vision-based robotic system performing the

2

Section 1.1. Motivation and Challenges

aforementioned drink fetching task requires a massive quantity of a priori information. Forexample, an object recognition system requires object models such as 3D surface meshesor visual features extracted from training images. As such, the drink fetching robot willrequire object models for every beverage it needs to fetch in order to ensure reliable objectrecognition performance. Considering studies showing that an average person encountersseveral thousand unique objects in the household [Buss et al., 2008], exhaustive modelbuilding will be very labour intensive and probably intractable. As such, the developmentof model-free methods to sense household objects is an important challenge that will helpreduce a domestic robot’s dependence on a priori knowledge.

Sensing household objects for which the robot has no model appears to be a chicken-and-egg problem. How can a robot sense an object when it does not know what the objectlooks like? The trick is to tell the robot what kind of features to extract. For example,a robot searching for tennis balls can be instructed to visually sense round objects witha yellow hue. A useful observation for domestic environments is that many householdobjects are bilaterally symmetric. This observation is especially applicable to surface ofrevolution objects such as cups and bottles, which are bilaterally symmetric from manyviewing orientations. Statistically, it is rare for symmetry to occur by chance. Visualsymmetry usually indicates an object or a manually arranged symmetric constellationof objects. Both kinds of symmetry provide useful and reliable information. However,bilateral symmetry is rarely employed in robotic systems. This appears to be due to thelack of a robust and computational fast method of detection. Therefore, the author ismotivated to design and implement a fast method of bilateral symmetry detection thatcan function robustly on real world images. By using symmetry, the user is freed fromhaving to explicitly provide training images or manually constructed object models. This iswhat is meant by model-free vision. The symmetry detector also addresses the challengeof developing model-free vision methods that will help a robot robustly deal with newobjects without a priori models.

Robust object recognition methods such as Boosted Haar Cascades [Viola and Jones, 2001]

require large quantities of training data, in the order of several hundred to a thousandimages for each object. Approaches such as SIFT [Lowe, 2004] require fewer trainingimages but rely on manual segmentation of the target objects. Limiting the scope tosurface of revolution objects such as cups, an object is defined as a physical entity thatmoves coherently when actuated. This definition allows a robot to detect and segmentobjects by applying physical manipulation in parallel with model-free visual sensing meth-ods. This approach is interesting as it departs from the norm of moving the camera whilekeeping the object stationary. By actuating an object instead of the camera, a robot canautonomously obtain visual segmentations that are representative of the physical world.This opens up the possibility of additional autonomy such as training data collection andobject learning. As such, the visual sensing and manipulator control challenges posed byautonomous object segmentation are inherently interesting and worth addressing.

The above challenges are further detailed below.

3


Fast and Robust Detection of Bilateral Symmetry

Gestalt theorists observed that symmetry is an important visual property that people useto represent and detect objects. It is a visual property that rarely occurs by chance andcan be used for local feature extraction as well as global object description. In the fieldof computer vision, symmetry detection has been an active area of research for the lastthree decades. Chapter 2 provides a taxonomy of the different types of symmetry as wellas a comprehensive survey of existing detection methods. While a fast radial symmetrydetection method is available [Loy and Zelinsky, 2003], the literature survey revealed theneed for a fast and robust bilateral symmetry detection method. More specifically, manyrobotic applications demand a symmetry detector that operates in real time on videos ofreal world scenes captured using a low cost colour camera.

Development of Model-Free Object Sensing Methods

Given the successful development of a fast bilateral symmetry detector, model-free meth-ods of object sensing can be designed and implemented. These methods should providea robot with a visual sensing toolbox to deal with symmetric objects without having torely on a priori information such as the colour and shape of a target object. In partic-ular, these methods will be applicable to surface of revolution objects such as cups andbottles that are bilaterally symmetric from many points of view. Ideally, these model-freemethods will be fast and robust so that they can meet real time requirements, which areplentiful in robotic applications.

For a domestic robot dealing with symmetric household objects, several sensing functionsare especially important. Firstly, a model-free object segmentation method is needed.Segmentation allows for the detection of objects as well as obtaining useful size and shapeinformation. Secondly, a stereo vision method is needed to localize symmetric objects inthree dimensional space. Object localization allows the use of robotic manipulation toactuate and grasp objects. Finally, real time object tracking should be developed. Realtime tracking is needed to identifying moving objects in the robot’s environment and todetermine the effects of robotic action on a target object. Overall, the model-free sensingmethods should be fast, robust and general so that they can be applied to different typesof robotic problems.

Autonomous Object Segmentation

Inspired by the work of Fitzpatrick [Fitzpatrick, 2003a], the concept of autonomous objectsegmentation is further explored. Fitzpatrick’s robotic system sweeps its one-fingeredend effector across a scene to simultaneously detect and segment objects using a pokingaction. Objects are discovered by visually monitoring for a sharp jump in motion causedby a collision between the end effector and the object. Object segmentation is performedoffline using a graph cuts approach by analysing the motion in the video frames near the

4

Section 1.2. Key Contributions

time of effector-object impact. Due to the harsh nature of the object poking action, his testobjects need to be durable and unbreakable. Also, his approach sometimes includes theend effector in object segmentations and is prone to producing near-empty segmentations.

Unlike Fitzpatrick’s accidental approach to object discovery and segmentation, the wish isto leverage the visual symmetry toolkit to generate a plan before applying robotic action.Essentially, the robot produces hypotheses of object locations and then tests the validityof these hypotheses using robotic actions. This enables the application of gentle andcontrolled robotic action, which allows the manipulation of fragile objects. Additionally,poor segmentations can be prevented by monitoring the actuated object during roboticmanipulation. This allows the robot to detect failed manipulations such as the targetobject being tipped over by the robot. Also, object segmentation should be performedonline so that the robot can quickly resume visual sensing of its environment.

Object Learning by Robotic Interaction

A domestic robot requires autonomy in order to perform household tasks intelligently. Theability to autonomously segment objects allows greater autonomy. Information gainedfrom autonomous object segmentation can be used to guide more advanced manipula-tions such as object grasping. This effectively leverages a simple robotic manipulation toperform more complex manipulations. Once an object has been grasped, training datacollection and object learning becomes possible. Object learning allows a robot to adaptto changing environments by learning new objects autonomously. Autonomous objectlearning is useful for domestic robots that have to deal with the large number of uniqueobjects in the average home. Ideally, the burden of object modelling will be shifted fromthe researcher and user to the robot.

1.2 Key Contributions

Five conference papers have been published on the research described in this thesis. AnIJRR journal article [Li et al., 2008] covers research on bilateral symmetry detection as wellas using detected symmetry to segment objects and to perform real time object tracking.The research contributions made by the work in this thesis are detailed below.

Fast and Robust Bilateral Symmetry Detection

The core novelty of the proposed fast bilateral symmetry detection method is its quickexecution time. Running on a 1.73GHz Pentium M laptop, the C++ implementation of thefast symmetry detector only requires 45ms to find symmetry lines across all orientationsfrom a 640× 480 input image. The detection time can be reduced linearly by restrictingthe angular range of detection. At the time of writing, the proposed method is the fastest

5


bilateral symmetry detector available. As a point of comparison, timing trials documentedin this thesis show that the generalized symmetry transform [Reisfeld et al., 1995] isroughly 8000 times slower than the proposed detection method. The proposed symmetrydetection method is also faster than SIFT-based approaches such as [Loy and Eklundh,2006].

As detailed in Section 2.2, the majority of symmetry detection methods are unable tooperate on real world images. For example, experiment results presented in this thesisshow that the generalized symmetry transform [Reisfeld et al., 1995] is unable to dealwith background intensity changes. This restricts the ability of the generalized symmetrytransform to operate on real world images, which regularly have backgrounds with non-uniform intensity. SIFT approaches rely on texture symmetry as opposed to contoursymmetry, which means physically asymmetric objects with symmetric surface patternsare considered symmetric.

The fast symmetry detection method is able to operate on real world images by leveragingthe noise robustness of Hough transform and Canny edge detection. As fast symmetry usesedge pixels as input, it is able to detect the symmetry lines of multi-colour, reflective ortransparent objects. The fast symmetry algorithm was first published in [Li et al., 2005].An updated version of the detection method, with lower computational cost and greaternoise robustness is available from the author’s IJRR article [Li et al., 2008]. Details ofboth publications are as follows.

• Wai Ho Li, Alan M. Zhang and Lindsay Kleeman. Fast Global Reflectional Sym-metry Detection for Robotic Grasping and Visual Tracking. In Proceedings of Aus-tralasian Conference on Robotics and Automation, Sydney, December, 2005.

• Wai Ho Li, Alan M. Zhang and Lindsay Kleeman. Bilateral Symmetry Detectionfor Real-time Robotics Applications. International Journal of Robotics Research(IJRR), 2008, Volume 27, Number 7, pages 785 to 814

Real Time Object Segmentation using Bilateral Symmetry

Segmentation using symmetry has been previously investigated in [Gupta et al., 2005].Their method uses symmetry to augment the affinity matrix of a normalized cuts approach.Normalized cuts produces accurate segmentations but has a very high computational cost,making it unsuitable for real time applications. Additionally, the approach of Gupta etal. assumes symmetric pixel values within an object’s contour. This assumption does nothold for symmetric objects with asymmetric textures or when non-uniform illuminationcauses shadows and specular reflections on symmetric objects.

The proposed symmetry-guided object segmentation method is fast, requiring an averageof 35ms for 640×480 pixel images. Also, the assumption of symmetric internal pixel valuesis removed by using a Dynamic Programming (DP) approach that operates on edge pixels

6


to find near-symmetric object contours. Unlike traditional DP approaches [Lee et al., 2001;Mortensen et al., 1992; Yu and Luo, 2002] that rely on manually specified control pointsor curves, the proposed segmentation method is initialized automatically using an object’sdetected symmetry line. The symmetry-guided segmentation approach is published in thefollowing paper.

• Wai Ho Li, Alan M. Zhang and Lindsay Kleeman. Real Time Detection and Seg-mentation of Reflectionally Symmetric Objects in Digital Images. In Proceedings ofIEEE/RSJ Conference on Intelligent Robots and Systems (IROS06), Beijing, Octo-ber, 2006, pages 4867 to 4873.

Stereo Triangulation using Symmetry

Traditional stereo methods rely on matching corresponding features on an object’s surfaceto obtain three dimensional information. The proposed symmetry triangulation approachdiffers from the norm by using a structural object feature for matching. Symmetry trian-gulation can deal with transparent and reflective objects. Stereo methods such as thosesurveyed in [Scharstein and Szeliski, 2001] are unable to deal with these objects due totheir unreliable surface pixel information across stereo views.

Additionally, unlike stereo methods that triangulate features on the surface of objects,symmetry triangulation returns an axis that passes through the inside of an object. Thetriangulated symmetry axis is especially useful when dealing with surface of revolutionobjects such as cups and bottles, as their symmetry axes are equivalent to their axes ofrevolution. A symmetry axis can also be used to localize an object by looking for theintersection between the axis and the table on which the object rests. The work on stereosymmetry triangulation has led to the following publication.

• Wai Ho Li and Lindsay Kleeman. Fast Stereo Triangulation using Symmetry. InProceedings of Australasian Conference on Robotics and Automation, Auckland, De-cember, 2006.

Real Time Object Tracking using Symmetry

The author was the first to implement a real time visual tracking system using bilateralsymmetry as the primary object feature. By using the author’s fast bilateral symmetrydetector, the Kalman filter tracker is able to operate at 40 frames per second on 640×480video. The high tracking speed is achieved by feeding back the tracker prediction to thefast symmetry detector in order to limit the angular range of detection. Experiments onten real world videos that include transparent objects suggests high tracking robustness.The tracker also provides a real time symmetry-refined motion segmentation of the trackedobject.

7


A custom-built pendulum is used to quantitatively analyse the tracking error againstreliable ground truth data. The tracking error is measured for each video frame, allowingfor a deeper understanding of the tracker’s behaviour with respect to object speed andbackground noise. This departs from the normal computer vision practice of evaluatingtrackers based on their success rate at maintaining a convergent tracking estimate overa set of test videos. Additionally, a qualitative comparison is performed between HSVcolour centroid and bilateral symmetry as tracking features.

The symmetry tracker was first published as a conference paper in IROS [Li and Kleeman,2006b]. The work is also documented in the IJRR article [Li et al., 2008], with the additionof the quantitative error analysis and colour versus symmetry comparison.

• Wai Ho Li and Lindsay Kleeman. Real Time Object Tracking using ReflectionalSymmetry and Motion. In Proceedings of IEEE/RSJ Conference on IntelligentRobots and Systems (IROS06), Beijing, October, 2006, pages 2798 to 2803.

• Wai Ho Li, Alan M. Zhang and Lindsay Kleeman. Bilateral Symmetry Detectionfor Real-time Robotics Applications. International Journal of Robotics Research(IJRR), 2008, Volume 27, Number 7, pages 785 to 814


Most active vision methods focus on actuating the camera to obtain multiple views ofa static object. Departing from the norm, the work of [Fitzpatrick, 2003a] used roboticaction to actuate objects in order to perform motion segmentation. The proposed approachuses the same ideology of actuating objects instead of moving the camera, but differs fromFitzpatrick’s approach as follows.

Firstly, instead of Fitzpatrick’s accidental approach to object discovery, the proposedapproach finds interesting locations to explore prior to any robotic action. This allows thegeneration of an action plan that can incorporate higher level strategies such as exploringobjects nearest the camera first. Additionally, the actuation trajectory can be chosen suchthat changes in object scale and orientation are minimized.

Secondly, a short and gentle robotic nudge is used to actuate a target object. This departsfrom the fast sweeping action of Fitzpatrick’s robot, which requires unbreakable test ob-jects as collisions are unpredictable and high impact. Experiments show that the roboticnudge is able to actuate fragile and top heavy objects. The robotic nudge also has a smallworkspace footprint, which allows for easy path planning and collision avoidance.

Thirdly, Fitzpatrick’s approach is prone to poor segmentations where the result is nearlyempty or contains the robot’s end effector. Examples of these poor segmentations areshown in Figure 11 of [Fitzpatrick and Metta, 2003]. The proposed approach uses a seriesof visual checks during object manipulation to prevent poor segmentations. Near-emptysegmentations are prevented by initiating stereo tracking upon detecting object motion.

8


This ensures that insufficient object motion or the object being tipped over do not resultin a segmentation attempt. Additionally, the proposed approach never includes the endeffector in the segmentation result as the input video images used for motion segmentationare taken when the end effector is out of view.

Finally, the author uses a different approach to perform motion segmentation. Fitzpatrickuses a computationally expensive graph cuts approach, which requires several secondsof offline processing to perform object segmentation. The proposed symmetry-based ap-proach performs motion segmentations online, taking only 80ms for a temporal pair of1280 × 960 pixel images. This makes the approach more suited to real time applicationswhere the robot cannot afford to pause its visual sensing. The work on autonomous objectsegmentation resulted in the following publication.

• Wai Ho Li and Lindsay Kleeman. Autonomous Segmentation of Near-SymmetricObjects through Vision and Robotic Nudging. In Proceedings of IEEE/RSJ Con-ference on Intelligent Robots and Systems (IROS08), Nice, September, 2008, pages3604 to 3609


The majority of robotic vision systems model objects visually by actuating the camera.The proposed autonomous object learning approach departs from the norm by graspingand rotating an object while the camera remains static. The following research contribu-tions are made by the work.

Firstly, the work shows that a simple object manipulation, the aforementioned roboticnudge, can enable the use of more advanced manipulations such as object grasping. Ex-periments show that bottles of various shapes, both heavy and light, can be graspedautonomously after performing object segmentation using the robotic nudge.

Secondly, a contribution is made by showing that a robot can autonomously collect usefultraining data. After a successful grasp, the robot rotates the grasped object in front of itscamera to collect training images at fixed angular intervals. These images allow the entire360 degrees of the grasped object to be modelled visually.

Finally, SIFT descriptors [Lowe, 2004] are used to build object models from robot-collectedtraining images. Non-object SIFT descriptors are automatically pruned to prevent theirinclusion into object models. The resulting descriptor sets are used as object models in arecognition database. Experiments confirm that it is possible to use robot-learned objectmodels to perform robust object recognition.

9


1.3 Thesis Outline

The thesis chapters are organized in approximate chronological order. A multimediaDVD containing images and videos of experimental results is attached to this thesis. Thecontents of the multimedia DVD are detailed in Appendix A.

1.3.1 Chapter 2: Background Information

This chapter contains two sections. The first section provides an overview of visual sym-metry and explains the different types of symmetry. This prevents confusion in futurediscussions concerning symmetry detection. As symmetry is the core feature of the visualsensing methods to follow, the second section of this chapter provides a literature survey ofbilateral symmetry detection methods. Figure 2.3 provides a timeline of research progresson symmetry detection and how past research influences more recent endeavours. Thefigure also shows how the author’s novel symmetry detection approach relates to existingresearch. To prevent the reader from having to move backwards and forwards betweenchapters, related works for other chapters are discussed in their respective introductions.

1.3.2 Chapter 3: Symmetry Detection

The fast bilateral symmetry detection approach finds lines of symmetry from image edgepixels using the Hough transform and convergent voting. After discussing the algorith-mic details of the fast symmetry approach, detection results on synthetic and real worldimages are provided with accompanying discussion. The chapter also investigates thecomputational performance of the detection method. Additionally, this chapter includesa comparison between the generalized symmetry transform [Reisfeld et al., 1995] and thefast symmetry method that evaluates their noise robustness and detection characteristics.The chapter concludes with a comparison of the computational costs between generalizedsymmetry and fast symmetry using timing trials.

1.3.3 Chapter 4: Sensing Objects in Static Scenes

This chapter details two model-free methods that rely on detected bilateral symmetry tosense objects in static scenes. The first section details a method that uses symmetry toguide object segmentation, automatically extracting the edge contours of objects for whichsymmetry lines have been detected. The dynamic programming segmentation algorithmand a novel image preprocessing step called the symmetric edge pair transform are detailedin this section. Several segmentation results on real world images are also provided. Thesection also investigates the computational performance of the segmentation method.

The second section of the chapter details a stereo triangulation method that makes useof detected symmetries across two cameras to obtain three dimensional information. The

10

Section 1.3. Thesis Outline

triangulation process produces three dimensional symmetry axes that are used to localizeobjects on a table plane. The section also documents experiments that quantitativelymeasure the accuracy of symmetry triangulation. The section concludes with a qualita-tive comparison between dense stereo disparity and the proposed symmetry triangulationapproach.

1.3.4 Chapter 5: Real Time Object Tracking

This chapter covers research on real time object tracking using symmetry and motion. Thechapter details the Kalman filter tracker, a block motion algorithm used to reject unwantedsymmetries and a symmetry-refined motion segmentation method. The symmetry trackeris tested on ten real world videos. Additionally, a custom-built pendulum rig is used toquantitatively measure symmetry tracking error against predictable ground truth data.The same pendulum rig is also used to qualitatively compare the performance of colourand symmetry as object tracking features.

Multimedia Content

Videos of the tracking results can be found in the tracking folder of the multimedia DVD.

1.3.5 Chapter 6: Autonomous Object Segmentation

This is the first of two system integration chapters. This chapter details the use of a preciserobotic nudge to actuate an object in order to obtain its segmentation autonomously. Therobot makes use of stereo symmetry triangulation to localize symmetric objects. The realtime object tracker is applied in stereo to monitor robot-actuated objects. The chapterincludes segmentation results from twelve experiments conducted on ten different testobjects set against different backgrounds. A discussion of the experimental results isprovided at the end of the chapter.

Multimedia Content

Autonomous object segmentation results are available alongside corresponding videos ofstereo tracking and the robotic nudge from the nudge folder of the multimedia DVD.

1.3.6 Chapter 7: Object Learning by Robotic Interaction

This chapter details an autonomous robot than learns about new objects through inter-action. By leveraging the autonomous segmentation approach from the previous chapter,the robot gathers training data and builds object models autonomously. After object seg-mentation, training data is collected by grasping and rotating an object to gather images

11


covering the entire 360-degree view of the grasped object. SIFT-models are built using therobot-collected training images. Experiments on seven bottles show that object modelslearned autonomously by the robot allow robust and reliable object recognition.

Multimedia Content

Videos and images documenting the autonomous learning process are available from themultimedia DVD in the learning folder.

1.3.7 Chapter 8 : Conclusion and Future Work

This chapter provides a summary of how the motivating challenges brought forth in theintroduction are addressed by the presented research. As future works are only brieflydiscussed in previous chapters, a more thorough coverage is provided in this chapter.

1.3.8 Appendix A

Details the contents of the multimedia DVD and provides online URLs where some of themultimedia content is available for download.

1.3.9 Appendix B

Details the design and implementation of a stand-alone motion controller for the PUMA260 robot manipulator. The appendix includes the direct and inverse kinematic calcula-tions for the PUMA 260 manipulator.

12

Research is to see what everybody else has seen, andto think what nobody else has thought

Albert Szent-Gyorgyi 2Background Information

2.1 Visual Symmetry

2.1.1 Symmetry in Human Visual Processing

Symmetry is one of many structural relationships humans use to interpret visual informa-tion. Three dimensional structures, such as surfaces of revolution like cylinders and cones,are often inferred from line drawings using symmetry. Gestalt theorists in the early 20thcentury suggested a set of laws, the Pragnanz, which model the way humans group lowlevel visual entities, such as lines and dots, into objects. Symmetry is one of the featuressuggested by these theorists as being essential for grouping low level visual entities intoobjects.

The Pragnanz law of symmetry grouping states that Symmetrical entities are seen asbelonging together regardless of their distance. This implies that the human vision systemtends to cluster symmetric entities together regardless of scale. Computer vision research,especially in areas dealing with producing human-like detection behaviour, often draw onGestalt theories and the Pragnanz laws as motivation in the design phase. Couple thiswith the fact that many man-made objects are symmetric provides further motivation forthe use of symmetry as a visual feature in robotic applications.

2.1.2 Types of Symmetry

To avoid future confusion, the taxonomy of symmetry must be explored before continuingfurther. Figures 2.1 and 2.2 provide an illustrated summary of the common types ofsymmetry encountered in images.

Bilateral symmetry, sometimes called reflectional symmetry, is described by a symmetryline. Image data, such as pixel values or contour shape, is equal when reflected acrossthe symmetry line. Note that the terms symmetry line, mirror line, line of reflection andreflection plane are used interchangeably in literature. Figure 2.1(a) is an example of abilaterally symmetric shape.

13

CHAPTER 2. BACKGROUND INFORMATION

(a) Bilateral symmetry (b) Skew symmetry

Figure 2.1: Bilateral and skew symmetry. Symmetry lines are solid black. The skew symmetryshape is produced by horizontally skewing the bilateral symmetry shape by π

4 . The black dots arepoint pairs that are symmetric about a shape’s symmetry line.

Skew symmetry occurs when a pattern with bilateral symmetry is skewed by a constantangle. In practice, skew symmetry tends to be found in images where a planar shapewith bilateral symmetry is viewed at an angle under weak perspective projection. This isillustrated by the shapes in Figure 2.1. The black dots mark points in each shape that aresymmetric about the shape’s symmetry line. Notice that bilateral symmetry is a specialcase of skew symmetry, where the skew angle is zero such that a line joining the symmetricpoint pair is perpendicular to the symmetry line. Therefore, skew symmetry detectors canalso detect bilateral symmetry.

A shape has rotational symmetry if its appearance remains the same after rotation. Theorder of rotational symmetry is defined as the number of times the repetition occurs duringone complete revolution. This is sometimes abbreviated to CN , where N is the order ofrotational symmetry. Mathematically, a pattern with rotational symmetry of order Nwill remain invariant under a rotation of 2π

N . Figure 2.2(a) is an example of a shape withrotational symmetry of order 4.

Radial symmetry, also known as floral symmetry, describes the kind of symmetry found inactinomorphic biological structures such as flowers. Radial symmetry is the special casewhere a shape has both bilateral symmetry and rotational symmetry. A shape with radialsymmetry of order N will have N symmetry lines as well as rotational symmetry of orderN . An example of a shape with radial symmetry of order 8 is shown in Figure 2.2(b).Note that the symmetry lines of the shape are drawn as dashed lines. In essence, radialsymmetry can be seen as a special case of rotational symmetry, where the spokes arebilaterally symmetric. Radial symmetry of order N can be abbreviated as DN .

Circular structures are described as having radial and rotational symmetry of order infinity.An example of this is shown in Figure 2.2(c). Because of this property, radial symmetrydetectors are often applied as circle detectors in computer vision applications such as eyetracking.

14

Section 2.2. Related Research

(a) Rotational – Order 4 (b) Radial – Order 8 (c) Radial – Order ∞

Figure 2.2: Rotational and radial symmetry. Radially symmetric shapes also exhibit bilateralsymmetry. Bilateral symmetry axes are shown as dashed lines.

2.2 Related Research

This section provides an overview of research related to the detection of symmetry inimages. Emphasis is placed on bilateral and skew symmetry detection methods as well astheir shape representation predecessors. For brevity’s sake, the term detection will referspecifically to the task of finding bilateral symmetry in images. For the sake of readability,research related to specific applications of the author’s symmetry detector, such as objectsegmentation and real time tracking, will be covered in their respective chapters.

Figure 2.3 provides a summary of bilateral symmetry detection research over the past fewdecades. Arrows in the figure indicate the adaptation or application of ideas from previousresearch. For example, along the right hand side of the figure, the arrow from Brooks toPonce highlights the fact that Ponce’s skew symmetry detection method uses the Brooksribbon. The detection method developed by the author is shown in bold towards thebottom of the figure.

2.2.1 Accurate Shape Representation

Much of the pioneering work for symmetry detection was carried out by researchers fromdigital image processing and medical imaging fields. These researchers were concernedwith the concise description of shapes for the automation of visual tasks such as detectingand matching biological entities in medical images. While topology can be used to describethe internal structure of such objects, the lack of uniqueness in the description means thattasks such as shape matching will produce many false positives.

The inadequacy of topology for shape representation was succinctly described in [Blumand Nagel, 1978]. In the paper, Blum and Nagel stated that Topology is so general that allsilhouettes without holes are equivalent. Also, as biological structures tend to vary in shapeand size between samples, template-based matching will be inherently inaccurate. As such,

15


early symmetry research was focused on providing methods of shape description withhigh accuracy and uniqueness while being computationally compact in terms of storage.Methods for the detection of symmetry were simply a positive side effect from shaperepresentation research.

Ribbons

The medial axis transform [Blum, 1964], later republished as [Blum, 1967], was the first ofmany shape representation methods proposed as an alternative to traditional topologicalapproaches. This transform generates a Blum ribbon that represents the internal structureof a shape as well as allowing accurate regeneration of the shape’s contour. The Blumribbon is generated by sliding a circle of variable size along the interior of a shape, makingsure that the circumference is in contact with two points of the contour at all times.Computationally, the ribbon is recorded as the loci of circle centers along with a series ofradii, one for each locus. This loci of circle centers is also known as the medial axis orskeleton of a shape.

Subsequently, a ribbon method based on sweeping a line of constant angle relative to ashape’s skeleton, instead of a circle, was proposed as the Brooks ribbon [Brooks, 1981].Another approach using a line touching two points on the shape’s contour with mirroredlocal tangents was suggested in [Brady and Asada, 1984]. This paper by Brady andAsada is also the first of the ribbon papers that explicitly defines a method for symmetrydetection. A summarizing analysis of the three aforementioned ribbons can be found in[Ponce, 1990]. A multi-scale version the Blum ribbon and the medial axis transform hasalso been patented [Makram-Ebeid, 2000].

In ribbons literature, the ribbon’s internal structure, such as the loci of circle centers forthe Blum ribbon, is also called a spine. The stencil used in the sweep, such as the circleused for Blum ribbons, is called the generator. Note that, unlike bilateral symmetry lines,the spine is allowed to have tree-like branches within an object and does not have to bea straight line. This is why the ribbon spine is also referred to as a shape’s skeleton andthe process of obtaining it is also called skeletonisation.

Distance Transform Skeletonisation

Skeletonisation approaches using distance transform [Rosenfeld and Pfaltz, 1966] appearedin literature near the time of Blum ribbons. Additional research on the storage efficiencyof shape representation using distance transform [Pfaltz and Rosenfeld, 1967] and theparallelization of wavefront computations [Rosenfeld and Pfaltz, 1968] arrived in subse-quent years. Unlike ribbons, which deals with smooth, continuous contours, the distancetransform approaches deal specifically with digital images and provide discrete shape rep-resentations.

16


In its early incarnation, even with the parallel wavefront propagation extension, the com-putation cost of distance transform is very high. The more efficient kernel-based implemen-tation, used commonly in robotics and computer vision, was mathematically formalizedtwo decades later in [Borgerfors, 1986].

Generalized Cones

Generalized cones [Nevatia and Binford, 1977] is a computationally efficient alternativeto the Blum ribbon for shape representation. The generalized cones approach extractsshape skeletons by first producing piecewise linear skeletal sections at discrete orienta-tions. Through a series of refinement and fusion steps, the skeletal sections are thencombined to form a complete skeleton. By using discrete orientations and distances inits calculation, the method lowers computational requirements at the cost of decreasedaccuracy of representation. It is conceptually similar to a piecewise implementation of theBrooks ribbon.

2.2.2 Towards Symmetry Detection

Early detection methods assume near-ideal registration of the symmetric shape to theimage center. The input data used in experiments have low noise and generally the inputdata is manually preprocessed to enhance or extract object contours prior to detection.Essentially, object segmentation or recognition is a mandatory preprocessing step to ensuresuccessful detection.

Ribbon-based Approaches

As mentioned earlier, symmetry detection came about as a byproduct of ribbons and shaperepresentation. [Brady and Asada, 1984] describes a ribbon generation method calledsmoothed local symmetries (SLS), which can be used directly for symmetry detection byproducing a locally symmetric skeleton. Earlier, [Blum and Nagel, 1978] also suggesteda method called the symmetric axis transform (SAT). The SAT uses the medial axis ofBlum ribbons to segment objects into symmetric portions but does not explicitly detectsymmetry.

Ribbon-based shape representation methods operate on perfectly extracted contours, withno discontinuities such as gaps. Most methods also require the tangent or gradient alongthe contour. The ribbon-based methods all make use of hand labelled data, in the formof manually extracted contours. Also note that as ribbons are generated using a param-eterized structure swept along a curve, both SAT and SLS can produce branches in theobject’s symmetry skeleton. Of all ribbon-based approaches, the skeleton produced bySLS is the most similar to an object’s bilateral symmetry line as detected by the author’smethod.

17


Hough Transform Approach

First proposed as a US patent [Hough, 1962], the Hough transform was originally designedas a noise robust method of line detection. The method was improved by introducing aradius-angle parameterization and extended to detect curves in [Duda and Hart, 1972].The algorithm was further generalized in [Ballard, 1981] to detect arbitrary shapes. Bal-lard’s method can also deal with scaling and rotation of the target shape.

Soon after Ballard’s paper, a Hough transform approach to symmetry detection was pro-posed [Levitt, 1984]. Levitt’s method performs line detection using Hough transform onthe midpoints between pairs of input points, as well as the original input points. Theuse of the Hough transform provides additional noise robustness not present in other ap-proaches, allowing the method to handle broken contours and operate on sparse data.Coupled with the ease of implementation and simple parameterization, many detectionmethods, including the author’s, stem from Levitt’s seminal work.

Extending Levitt’s Hough transform approach, Ogawa proposed a method to find bilateralsymmetry in line drawings [Ogawa, 1991]. Apart from applying Levitt’s approach todigital images, Ogawa also suggested the use of microsymmetry, local symmetries betweencontour segments similar to those found in ribbons, to find larger scale global symmetry.In essence, Ogawa’s work is the earliest attempt at a multi-scale detection method.

2.2.3 Other Approaches

Research targeting bilateral symmetry detection in digital images began with Attallah’sseminal work [Atallah, 1985]. The paper proposed a mathematical approach, withO(Nlog(N))minimum complexity, that can find lines of symmetry in an image with points, line seg-ments and circles. This global detection method assumes that the input data is symmetricand does not provide discriminatory detection to determine whether symmetry exists.

Marola’s work provides a more general detection method [Marola, 1989a], robust to slightasymmetries in the input data. This paper also provided an important insight into sym-metry detection. Marola suggested that symmetry lines of near-symmetric shapes willtend to deviate from passing through the center of mass of the shape. This is certainlytrue for averaging approaches based around ribbons, but fortunately does not hold truefor voting approaches such as those using Hough transform. In a separate paper, Marolaproposed a symmetry-based object detection method [Marola, 1989b]. The method func-tioned by convolving a mirrored template of the target object with the input image. Asymmetry score is used to judge the correctness of the detection location and orientation.The method requires prior knowledge of the object’s symmetry line.

18


2.2.4 Skew Symmetry Detection

Ribbon Approach

Revisiting ribbon-based approaches, a detection method for skew symmetry was proposedin [Ponce, 1990]. Ponce’s method uses Brooks ribbons with a straight line skeleton. Neva-tia and Binford’s method for finding generalized cones [Nevatia and Binford, 1977] wasapplied to improve computational performance over ribbon-based techniques of the past.

In terms of computational complexity, the ribbon-based SLS proposed by Brady has acomplexity of O(n2), whereas Ponce’s method only has a complexity of O(kn), where kis the number of discrete orientations for which skeletons are produced. The variable n isthe number of input data points in both cases, and the reduction in complexity is partiallydue to the use of midpoints in the generalized cones algorithm. However, Ponce’s skewsymmetry detection scheme requires manual pruning of input edge pixels.

Hough Transform Approaches

On the left of Figure 2.3, skew symmetry detection methods using Hough transform canbe found. Note that the author’s fast symmetry detection method, detailed in the nextchapter, can also be extended to detect skew symmetry. The method of [Yip et al., 1994]

uses pairs of midpoints, each formed from an edge pair, in a Hough voting process with acomplexity of O(N4), where N is the number of input edge pixels.

The method of [Cham and Cipolla, 1994] take a different approach based on edge groupswith attached orientation called edgels. Hough transform is applied as a rough initialdetection step. Their method performs a Hough transform on the intersection of edgelgradients. Note that the detection process requires the manual fitting of B-spline curvesto edge data so that edgels and their tangents are accurately discovered. This is difficultto automate, especially when the input image produce many noisy edge pixels.

These Hough transform detection methods led to an improved skew symmetry detector [Leiand Wong, 1999], targeting bilateral planar objects under weak perspective projection.This method combined aspects of previous work to provide an automatic detection schemeoperating on edge pixels instead of edgels, while also having better computational efficiencythan the method of Yip et al. The complexity of this improved method is O(MN2), whereN is the number of input edge pixels. Symmetry is detected across M discrete skew angles.

2.2.5 Perceptual Organization

The detection methods covered so far are generally limited to operating on low noise data,in the form of hand segmented object contours or pruned edge images. Ponce was the firstto depart from manual object segmentation by using hand picked edges from a Canny edgedetector [Canny, 1986]. The first completely data driven detector was proposed for the

19


purposes of perceptual organization [Mohan and Nevatia, 1992]. This detector is capableof finding local symmetries in the form of parallel edge segments without requiring anymanual preprocessing of the input data.

2.2.6 Multi-Scale Detection Approaches

With the ever increasing popularity of scale space theory, a multi-scale symmetry detectionapproach called the generalized symmetry transform (GST) was proposed in [Reisfeldet al., 1995]. This method detects bilateral symmetry using a combination of factors,including image gradient magnitude, image gradient orientation and a distance weightingfunction that adjusts the scale of detection. This scheme also provides a way to find radialsymmetry, which can be used as a corner-like feature detector at low detection scale,similar to Harris corners [Harris and Stephens, 1988]. Section 3.5 contains a comparisonbetween the author’s detection approach and GST. An overview of the GST detectionsteps and its computational requirements are also provided in the same section.

The symmetry distance approach [Zabrodsky et al., 1995] produces a continuous measureof symmetry. The symmetry distance is described as the minimum distance required tomake a set of points symmetric. The difficulty of using this detection method lies in theselection of points representative of a shape. The paper documents successful applicationof this symmetry measure to the tasks of occluded shape completion and the detection oflocally symmetric regions in an image.

A multi-scale detection approach making use of probabilistic genetic algorithms for globaloptimization of symmetry parameters has been proposed [Kiryati and Gofman, 1998]. Thismethod treats bilateral symmetry, namely the location and orientation of the symmetryline along with the scale of symmetry, as parameters in a global optimization problem. AUnited States patent [Makram-Ebeid, 2000] also proposes a multi-scale detection approach.The method uses Blum ribbons at multiple scales to obtain a median axis for strip-shapedobjects.

While the multi-scale approaches described above are an improvement over older detectionschemes, they operate under the assumption of low input noise. The lack of noise robust-ness exhibited by these methods make them unsuitable for applications that suffer fromsensory noise. Also, the time-critical nature of many robotic applications prohibit the useof multi-scale methods due to their high computational costs. As such, these multi-scalemethods are rarely applied to the domain of robotics.

2.2.7 Applications of Detection

In the realm of mobile robotics, bilateral symmetry can be used to generate image signa-tures in conjunction with dynamic programming [Westhoff et al., 2005]. The symmetryimage signature allows a mobile robot to compare panoramic images of its surroundings

20


with a database of images collected in the past to perform place recognition. Huebner hassince extended this approach to allow mutli-scale detection[Huebner, 2007]. The author’sapproach to object segmentation, described in Section 4.2, also makes use of symmetry-guided dynamic programming to achieve a different goal.

Not included in the full-page figure due to limited space, symmetry detection can also beused to complete occluded shapes [Zabrodsky et al., 1993] and for robust model fitting[Wang and Suter, 2003]. Radial symmetry has been applied to the problems of roadsign detection [Barnes and Zelinsky, 2004] and eye detection [Loy and Zelinsky, 2003;Loy, 2003]. In the domain of robotics and object manipulation, bilateral symmetry hasalso been applied to the modelling of cutlery [Yl-Jski and Ade, 1996].

2.2.8 SIFT-Based Approaches

Detection methods for bilateral symmetry [Loy and Eklundh, 2006] and symmetry underperspective projection [Cornelius and Loy, 2006] using mirrored SIFT features [Lowe, 2004]

have been proposed. By exploiting the affine and lighting invariance of SIFT features,which are also highly unique, symmetry is detected robustly for noisy real world images.These two robust detection methods have the potential to be applied to a variety of roboticsapplications. However, due to computational limitations, which at the time of writing areslowly being overcome by graphics processing unit (GPU) implementations of SIFT, thesemethods are unsuitable for time-critical applications due to the high computational costsof SIFT detection and matching.

21


Blum Blum Ribbon (Medial Axis Transform)

1964,1967

Rosenfeld and Pfaltz Distance Transform Skeleton

1966,1967,1968

Blum and Nagel Symmetric Axis Transform

1978

Brady and AsadaBrady Ribbon

(Smoothed Local Symmetries)1982,1984

LevittSymmetric Hough

Transform1984

Nevatia and BinfordGeneralized Cones

1977

BrooksBrooks Ribbon

1981

PonceSkew Symmetry

Detection1990

OgawaLine Drawing Analysis

using Symmetric Hough1990

Mohan and NevatiaSymmetry Detection using Perceptual Organization

1992

Yip et alSkew Symmetric Hough

Transform using Edge Pixels1994

Cham and CipollaSkew Symmetric Hough

using Edgels1994

Reisfeld et alGeneralized

Symmetry Transform1995

Lei and WongImproved Skew Symmetric

Hough Transform1999

Sun and SiDetection using

Orientation Histogram1999

Shen et alDetection by Generalized

Complex Moments1999

Westhoff et alSymmetry Detection using

Dynamic Programming2005

Loy and EklundhSymmetry Detection using SIFT

and Hough Transform2006

Cornelius and LoyDetecting Symmetry in Perspective using SIFT

2006

Year

1990

2000

Author’s WorkFast Bilateral Symmetry Detection

using Hough Transform2005

MarolaNear Symmetry Detection

1989

AtallahSymmetry Detection

1985

Zabrodsky et alSymmetry Distance

1995

Kiryati and GofmanSymmetry Detection using

Global Optimization1998

Figure 2.3: Research related to bilateral symmetry detection.

22

Symmetry is what we see at a glance

Blaise Pascal

3Symmetry Detection

3.1 Introduction

In Section 2.2, a plethora of symmetry detection methods was surveyed. Most of thesemethods are designed for non-robotic applications, with the majority being from the do-mains of computer vision and medical image processing. In the proposed robotic system,the symmetry detection results will be used to perform tracking, segmentation and stereotriangulation of new objects, enabling robust object manipulation and autonomous learn-ing. The following issues must be addressed by a symmetry detection method applied insuch a robotic system.

Detectability

First and foremost, bilateral symmetry must be detectable for the target objects. Ac-cording to Nalwa’s work on line drawing interpretation [Nalwa, 1988a; Nalwa, 1988b] andthe bilateral symmetry of line drawings [Nalwa, 1989], all drawings of an orthographicallyprojected surface of revolution will exhibit bilateral symmetry. The bilateral symmetryline will also coincide with the object’s projected axis of revolution. Moreover, as detailedin page 7 and pages 517-573 of [Forsyth and Ponce, 2003], an orthographic projection issimply a special case of weak perspective projection where scaling has been normalized tounity (or negative unity).

In practical terms, Nalwa’s work implies that an object with a surface of revolution, suchas a cup, has visually detectable bilateral symmetry when viewed from many directions. Ifthe symmetry line is measured from two or more view points, stereo triangulation shouldbe possible. The resulting three-dimensional symmetry axis will be the surface’s axis ofrevolution. Deviations from the ideal surface of revolution can be treated as visual noise.In practical terms, this means that bilateral symmetry can be detected as long as therobot’s cameras are not too close to the test objects.

23

CHAPTER 3. SYMMETRY DETECTION

Real Time Operation

For robotic sensing applications such as object tracking, real time operation is essential.Real time performance is especially important during tasks that require immediate sensoryfeedback, such as object manipulation. Existing methods for bilateral symmetry detectionare unable to operate in real time on large images that have a million or more pixelsdue to their high computational costs. As larger images generally provide more sensoryinformation and improve the upper limit of accuracy in tasks such as tracking and stereotriangulation, symmetry detection methods applied to time-critical robotic applicationsshould be able to operate quickly on large input images.

Robustness to Noise and Asymmetry

In robotic applications, sensor data tend to be noisier than test image sets encounteredin computer vision. The detection method must be able to handle images taken withrobotic sensors under real world conditions. The accuracy of posterior estimates fromhigh level information filters, such as a Kalman filter, depends on the quality of theirinput measurements. Accurate symmetry detection reduces the tracking estimate errorand limits the chance of filter divergence. Shadows and specular reflections as well asasymmetric portions of partially symmetric objects must be dealt with robustly. Forexample, the detection method should be able to detect the symmetry of a mug undernon-uniform lighting while ignoring the asymmetric mug handle.

3.2 Fast Bilateral Symmetry Detection

While existing methods of detection are capable of addressing some of the issues detailed inthe introduction, none of them can address all the issues simultaneously. For example, theSIFT-based method of [Loy and Eklundh, 2006] can detect symmetry in real world imagesvery robustly but does not operate quickly enough for many real time applications. Thehigh computational costs of the existing bilateral symmetry detection methods appear tobe a common hinderance when trying to apply them to time-critical robotic applications.

The fast bilateral symmetry detection method was developed to remedy this situation.Herein referred to as fast symmetry, this detection method was initially developed incollaboration with Alan M. Zhang, who participated in the early design discussions thatled to the use of an edge pixel pairing approach. The majority of the algorithm design aswell as the entirety of implementation and experiments were performed by the author.

The fast symmetry detection method was first published as [Li et al., 2005]. Followingthis publication, the author further refined the detection process in order to reduce thecomputational cost of detection and to improve detection robustness. These refinements,including an edge pixel rotation step and allowing angular limits on detection orientations,

24

Section 3.2. Fast Bilateral Symmetry Detection

are detailed in Section 3.2.2 and Algorithm 1. The updated version of the detectionalgorithm is also available in press [Li et al., 2008].

3.2.1 Novel Aspects of Detection Method

High Detection Speed

The primary novelty of the detection method is the speed of detection. Gains in com-putational speed are achieved at several stages of the detection method. Firstly, insteadof operating on all pixels in an image, only high gradient locations, found using an edgedetector, are used as input data. In empirical tests, the Canny edge detector [Canny, 1986]

was found to greatly reduce the input data size. For 640× 480 images, the edge detectionstep typically reduces the image data to around 10000 edge pixels.

Secondly, a novel Hough transform method using a convergent voting scheme helps reducethe computational cost of detection. A rotation step before Hough voting further reducescomputational cost by eliminating trigonometric calculations within the algorithm’s innerloop. The rotation step also provides a way to limit the orientation range of detection,which linearly reduces the computational cost of detection.

Noise Robustness

Many existing detection methods rely on local image information such as gradient intensityand orientation. While useful for synthetic images, factors such as object texture andnon-uniform lighting can severely disrupt these local features in real world situations. Forexample, specular reflections can generate large gradient changes on object surfaces thatdo not represent any structural symmetry present in the physical world. Also, surfacetexture, such as logos on a soft drink bottle, can introduce symmetric gradients that areindependent of the symmetry of an object’s contour. To improve noise robustness, fastsymmetry only uses the location of edge pixels as input data. This has the added benefitof reducing input data size, thereby reducing the computational cost of detection.

The use of the Hough transform to find lines of symmetry further improves the noise ro-bustness of fast symmetry. The voting process of the Hough transform is able to ignoreasymmetric portions of roughly symmetric objects, such as a cup handle. By using a con-vergent voting scheme, pairs of edge pixels cast single votes during Hough accumulation.This convergent voting scheme produces sharper peaks in the Hough accumulator thanthe traditional approach of casting multiple votes for each edge pixel. The quantizationof parameter space inherent to the Hough transform also provides additional robustnessagainst small errors in edge pixel localization.

Additional noise robustness can be gained by adjusting the parameters of the detectionmethod. A pair of distance thresholds govern the scale of detected symmetry. The allow-able pairing distance between edge pixels is controlled by these thresholds. For example,

25


the upper threshold can be lowered to prevent the detection of inter-object symmetry. Apair of orientation parameters can further improve detection robustness by rejecting sym-metries with orientations outside a specified range. Further details about these parameterscan be found in Section 3.2.2 and Algorithm 1.

Application-Specific Features

As fast symmetry is targeted at robotic applications, specifically that of object segmenta-tion, real time object tracking and stereo triangulation, application-specific features havebeen added to enhance detection performance. Firstly, as mentioned earlier, orientationlimits can be applied to the detection process. Apart from providing a way to includeprior knowledge of object orientation to improve detection accuracy and robustness, theseorientation limits also provide another advantage. The method has a complexity directlyproportional to the number of discrete detection orientations. It follows that by limit-ing detection to a small range of angles, computational efficiency of detection is vastlyimproved. These orientation limits are used in the real time object tracker detailed inChapter 5.

The detection method also produces a global symmetry line. This is different from manyexisting methods that produce local symmetries at a particular scale, such as a shape’sstructural skeleton or an analogue measure of local symmetry. In this respect, fast sym-metry is global in that it detects bilateral symmetry as a feature representative of an entireobject contour. By detecting global symmetry lines, the method can operate on objects ofvastly different visual appearances under difficult lighting conditions. Also, fast symmetrycan detect the symmetry lines of transparent objects, textureless objects, multi-colour ob-jects and objects with dense surface texture. Detection methods that represent symmetryas a local feature, such as SIFT-based approaches, have difficulty with transparent andreflective objects due to the unreliable pixel information of the object’s surface.

3.2.2 Algorithm Description

Since the first publication of the fast symmetry detection algorithm [Li et al., 2005], thedetection method has undergone many changes. The version detailed in Algorithm 1 is themost current incarnation. It is the version used in all of the robotic experiments presentedin this thesis. The current version differs primarily in the addition of an edge pixel rotationand grouping step. Also, the original version uses a weighting function based on localgradient orientation to determine the voting contribution of edge pixel pairs. Empiricaltests showed that this weighting is not robust to non-uniform scene lighting. As such, ithas been discarded from the detection method. Its removal also reduces the computationalcost of detection.

The detection process is described programmatically in Algorithm 1. The parameter pairθlower and θupper are used to limit the orientations of symmetry lines detected by fast

26


Algorithm 1: Fast bilateral symmetry detectionInput: I – Source imageOutput: sym – Array of symmetry line parameters (R, θ)Parameters:Dmin, Dmax – Minimum and maximum pairing distanceθlower, θupper – Orientation limits (Hough indices)Nlines – Number of symmetry lines returned

edgeP ixels← (x,y) locations of edge pixels in I1

Hough accumulator H[ ][ ] ← 02

for θindex ← θlower to θupper do3

θ ← θindex in radians4

Rot← Rotate edgeP ixels by angle θ. See Figure 3.25

for each row in Rot do6

for each possible pair (x1, x2) in current row do7

dx← |x2− x1|8

if dx < Dmin OR dx > Dmax then9

continue to next pair10

x0 ← (x2 + x1)/211

Increment H[x0][θindex] by 112

for i← 1 to Nlines do13

sym[i]← max(R, θ) ∈ H14

Neighbourhood around sym[i] in H ← 015

symmetry. The thresholds Dmin and Dmax control the scale of detection by placing limitson the minimum and maximum distance allowed between edge pixel pairs. In practice,Dmin is used to reject small scale symmetry, such as those caused by edge contours withmulti-pixel thickness. Dmax is used to reduce the effects of large scale symmetry, whichtend to be caused by background edge noise and inter-object symmetry. The parameterNlines controls the number of symmetry lines returned by the detection method.

Edge Detection and Sampling

Fast symmetry operates on the edge pixels of an input image. The Canny edge detector[Canny, 1986] is used to generate the edge image. The detection method is given thelocations of edge pixels as input. Note that no preprocessing of edge pixels is performedbefore detection. The Canny thresholds are set to ensure reasonable edge detection resultsfor indoor scenes. These thresholds are fixed across multiple experiments. The thresholdsare only modified when a change occurs in the camera’s gain or exposure settings. TheC++ implementation of fast symmetry uses the Canny edge detection function providedby Intel OpenCV 1.0 [Intel, 2006] with an aperture of 3 pixels.

27


Hough Transform using Convergent Voting

The Hough transform, first described in a patent [Hough, 1962] and later refined to modellines using polar parameters [Duda and Hart, 1972], is a method used to detect param-eterizable curves in a set of points. It is commonly employed to find straight lines andcircles in edge images. It has also been generalized to find arbitrary shapes in images [Bal-lard, 1981]. A vast collection of Hough transform methods and parameterizations aresummarized in a survey paper [Illingworth and Kittler, 1988].

Fast symmetry uses a modified Hough transform approach to find symmetry lines. Unliketraditional Hough methods, which require multiple votes for each edge pixel, fast symmetryuses a convergent voting scheme. This modified scheme greatly reduces the total numberof votes cast. In exchange, additional computation is required to perform edge pixelpairing prior to voting. The detected symmetry line is parameterized by its radius (R)and orientation (θ) relative to the image center.

Figure 3.1 illustrates the symmetry line parameterization and the Hough transform con-vergent voting scheme. As a bilateral symmetry line is bidirectional, the angle θ is limitedto −π

2 < θ ≤ π2 . Edge pixels, shown as dots, are paired up and each pair contributes a

single vote. For example, the edge pixel pair in black, linked by a solid line, contributesone vote to the dashed symmetry line.

しR

Edge Pixel

Symmetry Line

Edge Image

Figure 3.1: The Hough transform convergent voting scheme in fast bilateral symmetry detection.

In their work on randomized Hough transform [Xu and Oja, 1993], Xu and Oja observedthat convergent voting reduces Hough accumulation noise and improves the sharpness ofpeaks in parameter space. Convergent voting also reduces the computational cost of votingin exchange for additional processing to group edge pixels into pairs. This reduction in

28


computational cost is first described in [Ballard, 1981]. Exhaustively pairing N edge pixelshas a computational complexity of O(N2). With large N , this will have detrimental effectson a method’s real time performance. To overcome this, fast symmetry employs an edgepixel rotation and grouping step before edge pairing to reduce the effective size of N .

Edge Pixel Rotation and Grouping

Figure 3.2 contains a simple example of edge pixel rotation and grouping. Edge pixels,drawn as black dots, are rotated by angle θ. After this rotation, edge pixels with similary coordinates, belonging to the same scanline, are grouped into rows of a two-dimensionalarray named Rot. By only pairing values within each row of Rot, the resulting edge pixelpairs will all vote for symmetry lines at the current rotation orientation θ. The examplein Figure 3.2 will produce five votes for the symmetry line with orientation θ and radiusR = 2.

0 1

3,1

3,1

1

3,1

3,1

3,1

EMPTY

EMPTY

EMPTY

EMPTY

EMPTY

EMPTY

Rotx

y EMPTY

Rotated EdgePixels

しScanlines

Original Edge Pixels

Rotate by し

し

32

R

R

Figure 3.2: The edge pixel rotation and grouping step in fast bilateral symmetry detection.

This process of edge pixel rotation and grouping provides two benefits. Firstly, randommemory access across the θ dimension of the Hough accumulator is removed. This meansthat only a single row of the accumulator needs to be cached during voting. Secondly,the arithmetic to calculate symmetry line parameters during voting is greatly simplified.The polar radius, R, is found by taking the average of x coordinates in an edge pair. Theorientation does not require any calculation as it is simply the current angle of rotation. Bydoing this, computationally expensive calculations are avoided during the O(N2) votingprocess.

As pixel rotation has a complexity of O(N), the edge pixel rotation step is computationallycheap. In addition, the rotation step explicitly allows the use of orientation limits indetection, which can greatly reduce the computational cost of detection. Note also thatthe effective size of N during the O(N2) edge pairing and voting is reduced by the grouping

29


process as pairing only occurs between the subset of edge pixels within each row of Rot, asopposed to all edge pixels. The edge pixel rotation and grouping step significantly reducesdetection time for large input images.

Peak Finding and Sub-pixel Refinement

Lines 13 to 15 of Algorithm 1 details the peak finding process that returns the (R, θ)parameters of detected symmetry lines. The peak finding process consists of a maximasearch in the Hough accumulator followed by non-maxima suppression. Non-maximasuppression is performed after locating a maximum by zeroing the neighbourhood of binsaround the maximum, including the maximum itself. The non-maxima suppression stepprevents the detection of multiple symmetry lines with near-identical parameters. Inthe C++ implementation, a threshold proportional to the highest peak in the Houghaccumulator is also used to prevent the detection of very weak symmetries.

Hough accumulation is inherently susceptible to aliasing due to the quantization of (R, θ)parameter space into discrete bins. To improve detection accuracy, sub-pixel refinement isperformed to approximate the true peak in parameter space. The term sub-pixel is usedas the resulting peak is allowed to lie between Hough accumulator bins.

After locating the maximum value in the Hough accumulator, prior to non-maxima sup-pression, the 3 × 3 neighbourhood of values centered at the maximum are used to refinethe peak location. A Hessian fit of a two dimensional quadratic is used to calculate thesub-pixel offsets. The fitting process is performed as follows:

V =

V−1,−1 V0,−1 V1,−1

V−1,0 V0,0 V1,0

V−1,1 V0,1 V1,1

D =

[V−1,0−V1,0

2V0,−1−V0,1

2

]

H =

[V1,0 + V−1,0 − 2V0,0

V1,1+V−1,−1−V−1,1−V1,−1

4V1,1+V−1,−1−V−1,1−V1,−1

4 V0,1 + V0,−1 − 2V0,0

] (3.1)

First, the local neighbourhood values around the peak are defined as matrix V such thatthe maximum is located at V0,0. Treating the 3× 3 neighbourhood of values as a coarselysampled surface, the local differences and Hessian are calculated and placed into vectorD and matrix H respectively. The sub-pixel offsets are found by solving for Xoff in thefollowing equation.

HXoff = D (3.2)

The offsets in Xoff are added to the maximum location to produce a refined sub-pixel lo-cation. Brief experiments on synthetic images suggests that the use of sub-pixel refinement

30


provides noticeable improvement to detection accuracy. Accuracy is especially improvedin cases where the true symmetry line of a shape lies at the quantization boundary be-tween Hough accumulator bins. As sub-pixel refinement is only executed once for eachdetected symmetry line, it does not contribute significantly to the total computationalcost of detection.

3.2.3 Extensions to Detection Method

During the course of research, numerous modifications have been made to the basic de-tection method. These changes were made in an attempt to improve detection accuracy,robustness, flexibility and performance. Due to various reasons, such as the need to re-duce detection execution time, these extensions are not included in the core of the fastsymmetry method detailed in Algorithm 1. For the sake of completeness and to encouragefuture work, this section will briefly describe these extensions.

Skew Symmetry Detection

Bilateral symmetry, as discussed in Section 2.1.2, is a subset of skew symmetry where thesymmetry line and the line joining symmetric point pairs intersect at a right angle. Bymodifying the convergent voting portion of the algorithm, fast symmetry can be extendedto detect skew symmetry. In the modified scheme, a discrete range of skew orientationsare specified, for which symmetry lines can be detected. Instead of casting single votes,every edge pixel pair casts multiple votes, one for each skew orientation.

As multiple votes are cast for each edge pair, the computational cost increases by a constantfactor equal to the number of skew orientations. The skew symmetry detection complexityis O(MN2) where N is the number of edge pixels and M is the number of discrete skewangles for which symmetry can be detected. In order to recover the skew angle, twoadditional matrices with the same size as the Hough accumulator are needed. The skewangle is recovered using the method suggested in [Lei and Wong, 1999]. An exampleresult of skew symmetry detection is shown in Figure 3.3(b). The detected symmetry lineis shown as a solid red line overlayed on top of the source image’s edge pixels

Reducing Vote Aliasing

Early experiments on synthetic images exposed an aliasing problem where votes belongingto the same symmetry line can be split across multiple Hough accumulator bins. Forexample, if the orientation of a shape’s symmetry line is at 0.5 degrees, using a Houghorientation quantization of 1 degree, votes can be spread between bins of orientations 0and 1 within the Hough accumulator. This aliasing effect means that a symmetric shapeand its rotation can produce peaks of different heights in the Hough accumulator.

31


(a) Source image

(b) Detection result

Figure 3.3: Fast skew symmetry detection.

To overcome this, the votes cast can be spread to reduce the amount of aliasing. Instead ofincrementing a single accumulator bin, the values of surrounding bins are also incrementedby using a voting kernel. Votes can be spread using a variety of kernels. A 3 × 3 integerapproximate of a symmetric Gaussian is a good compromise between computational costand effective anti-aliasing. Instead of vote spreading, Gaussian blurring of the accumulatorafter voting also reduces aliasing.

However, due to the quantization of parameter space, Hough transform accumulation isinherently susceptible to vote aliasing. The vision systems presented in this thesis donot employ these anti-aliasing improvements as they never rely on the quantity of Houghvotes as a direct measure of symmetry strength. However, in applications where a reliablemeasure of symmetry strength is needed, spread voting schemes or Hough accumulatorblurring should be employed to combat vote aliasing.

32


Preventing the Detection of Non-Object Symmetry

As Canny edge detection and Hough transform are methods with high noise robustness,fast symmetry is inherently very robust to noisy input data. However, symmetry linescan be detected due to coincidental pairings of edge pixels belonging to different objectsor between object and background contours. These noisy symmetries can also occur inimages with large quantities of high frequency texture, which results in the generation ofmany edge pixels that do not belong to any object contour. The visual structure of ascene’s background, such as long edge contours from table tops, can also encourage thedetection of non-object symmetry.

Figure 3.4 is a simple example of the latter kind of unwanted symmetry. The black dotsrepresent edge pixels, which are overlayed over grey lines representing the contours fromwhich they were detected. As the vertical symmetry line has θ = 0, no edge pixel rotationis needed. Grouping the edge pixels for pairing produces the values in Rot. Notice that thevalues in Rot will vote for two symmetry lines at the current orientation. The symmetryline at R = −1 will receive votes from the pairs (2,−4), (1,−3) and (0,−2). Three pairsof (3, 1) will contribute votes to the other symmetry line at R = 2. Both symmetry lineswill have three votes at the end of the Hough transform.

0

3,1

3,1

3,2,1

EMPTY

EMPTY

EMPTY

3,1

EMPTY

2,1,0,-1,-2,-3,-4

EMPTY

Rot

x

y

2-2

R = -1 R = 2

-4

Figure 3.4: Unwanted symmetry detected from a horizontal line.

The symmetry line at R = 2 is expected as the cup-like U-shaped contour suggests sym-metry. However, the symmetry line at R = −1 has the same number of convergent votesbut is simply a straight line, not a symmetric shape from a human vision point of view.Technically, the horizontal line does have strong bilateral symmetry as it has many sym-metric portions across the symmetry line. In fact, a bilaterally symmetric V-shape willresult if the line is bent upwards, using its intersection with the symmetry line as a pivot.

33


Similarly, an attempt to widen and flatten the U-shaped cup contour will arrive at thehorizontal line after passing through shapes that look like the cross-section of a bowl.

The lack of prior assumptions concerning the kind of shapes fast symmetry should targetis the primary reason why unwanted symmetries are detected by fast symmetry. Todetect symmetries that are likely to belong to objects and reduce the chance of non-objectsymmetries being found, the edge grouping and voting processes can be modified. Twoapproaches can be used to steer detection towards symmetry lines of object-like shapes.

Firstly, the specific problem of detecting straight lines as being bilaterally symmetric canbe addressed by preprocessing the values in the rows of Rot prior to pairing. Horizontallines that occur after edge pixel rotation will contribute many values to a single row ofRot. By ignoring the rows of Rot that have too many values, long straight lines will berejected before paring, effectively eliminating the problem. For example, the horizontalline’s symmetry at R = −1 will not be detected if rows with more than three values inFigure 3.4 are ignored.

However, setting a fixed threshold is difficult as the number of edge pixels and theirdistribution in an image are scene-dependent. Even with a finely tuned threshold, itis possible to reject rows with edge pairs that belong to a symmetric object contour.Rejecting too many edge pixels from object contours will lead to failed detection. Ifthe threshold is set too low, rows with many edge pixels caused by visual clutter orhigh frequency texture will be wrongly rejected. Overall, this method of straight linerejection should be employed with a large threshold in situations where the additionalnoise robustness is sorely needed.

Secondly, instead of excluding rows of edge pixels from pairing, the convergent votingprocess can be modified to address the underlying problem. Looking again at the U-shaped contour in Figure 3.4, it appears more symmetric than the horizontal line becauseit is taller along the direction of its symmetry line. By exploiting this qualitative propertyof symmetric objects, the convergent voting process can be reformulated to favour tallsymmetric contours instead of flat, wide contours.

The voting process is modified by imposing the rule that any Hough accumulator binis only allowed to be incremented once by each row of Rot. Applying this rule to theexample in Figure 3.4, the three edge pixel pairs of (3, 1) continue to vote for the U-shapecontour’s symmetry line at R = 2. However, for the row with edge pixel values taken fromthe horizontal line, the voting is very different. Recall that the edge pixel pairs (2,−4),(1,−3) and (0,−2) all vote for the same symmetry line. Following the new voting rule,two of the three identical votes are ignored, reducing the strength of the symmetry lineat R = −1 to a single vote. The strength of the non-object symmetry line is now a thirdof its original. In can be seen that this method is very effective in reducing the quantityof votes contributed by straight line contours as well as dense patches of edge pixels fromhigh frequency texture.

34

Section 3.3. Computational Complexity of Detection

In practice, it is fairly difficult to set up a situation where unwanted symmetries oc-cur regularly. The use of a checkerboard pattern as background can consistently injectnon-object symmetry lines into the detection results, lowering the rankings of object sym-metries. Higher level processes such as Kalman filtering are generally able to ignore thenoisy detection results. Where additional noise robustness is needed, one of the methodsdescribed here should be applied based on the demands of the target application. Therow rejection method is suggested for applications where robustness against straight linesis needed. Note that as the row rejection method reduces the number edge pixels paired,computational cost of detection is also reduced. However, in applications where additionalcomputation costs can be afforded, the latter method of modified voting will be much moreeffective in rejecting non-object symmetries.

3.3 Computational Complexity of Detection

The fast symmetry algorithm consists of two loops in series. The first, beginning online 3 in Algorithm 1, performs the edge pixel rotation, grouping and convergent votingsteps. The second loop begins on line 13. This loop performs peak finding on the Houghaccumulator to return the parameters of detected symmetry lines. Peak finding has acomplexity of O(N) where N is the number of Hough Accumulator bins. The peakingfinding loop is repeated Nlines times. As Nlines is small, less than 10 in all detectionscenarios, peak finding contributes a very small portion of the overall computational costof detection.

The voting loop occurs on lines 3 to 12 of the algorithm. As this loop is carried out forθupper−θlower iterations, the computational cost of detection is reduced in a linear mannerby reducing the orientation range of detection. For example, limiting the orientation rangeof detection to ±10 degrees of vertical, which is one-ninth of the maximum range of 180degrees, will reduce detection time by a factor of nine. This linear reduction in executiontime is predicated on the assumption that the distribution of edge pixels in the rows ofRot after edge pixel rotation and grouping is similar for all orientations.

Each cycle of the voting loop contains two major steps. The first step is edge pixel rotationand grouping which occurs on line 4 and 5 of Algorithm 1. The rotation and groupingprocess has a complexity of O(Nedge) where Nedge is the number of edge pixels in the inputimage. As the edge pixels are rotated by a set of angles that are fixed at compile time,rotation matrices are calculated and placed into a lookup table prior to detection.

The second step of the voting loop begins on line 6 of the algorithm. This step iteratesthrough the rows of Rot, pairing the x coordinates of rotated edge pixels and performsconvergent voting. For each edge pixel pair, their distance of separation is checked againstthe thresholds Dmin and Dmax. These checks are formulated as if statements on lines 8to 10. The x coordinates of edge pixel pairs that satisfy the thresholds are averaged to

35


find the radius of the symmetry vote. The radius is labelled as x0 on line 11. On line 12,the convergent vote is cast by incrementing the Hough accumulator.

Edge pairing and convergent voting consumes the majority of computational time. There-fore, the computational complexity of detection is primarily dependent on the efficiencyof the pairing and voting processes. As such, the complexity of detection can be approxi-mated as follows.

COMPLEXITY ∝Nedge

2

RowsRot2 ×RowsRot × (θupper − θlower) (3.3)

Nedge, as used earlier, is the number of edge pixels extracted from the input image.RowsRot is the number of rows in Rot. Assuming uniformly distributed edge pixels acrossthe rows of Rot, the number of edge pixels per row is Nedge

RowsRot. As the edge pairing process

has N2 complexity, squaring the fraction gives the leftmost term in the multiplication.Performing edge pairing for each row of Rot requires RowsRot repetitions, which resultsin the middle term. The rightmost term represents the number of angles for which edgerotation and voting takes place, represented as a for-loop on line 3 of the algorithm.

Simplifying Equation 3.3 gives

COMPLEXITY ∝N2

edge(θupper − θlower)RowsRot

(3.4)

Equation 3.4 shows that both the orientation range of detection and the number of rows inRot directly affect the computational cost of detection. As suggested earlier, the detectionorientation range can be reduced to improve performance. Note that the orientationrange can be changed at runtime, which is used by the real time object tracker detailedin Chapter 5 to reduce detection execution times. Fixing the scanline height, RowsRot

will depend solely on the size of the input image. For the sake of computational efficiency,scanlines are 1-pixel high. This allows the grouping of edge pixels into Rot to be performedusing computational cheap rounding operations. To ensure that all edge pixels are grouped,RowRot is set to match the diagonal length of the input image.

3.4 Detection Results

Before applying the fast bilateral symmetry detection method to robotics applications, itis tested offline to gain a better understanding of the quality and speed of detection. Inthis section, the results of these tests are partitioned into three parts. The first subsectiondetails the detection of bilateral symmetry in synthetic images. The second subsectionshows detection results of the method operating on images of real world scenes containingsymmetric objects. The final subsection investigates the computational cost of detectionin practice using a series of timing trials.

36

Section 3.4. Detection Results

The same Hough quantization is used across all experiments. The Hough space is quantizedinto 180 orientation divisions, giving a θ bin size of 1 degree. The number of radius divisionsis equivalent to the diagonal of the input image. As such, the R bin size is set to 1 pixel.Canny edge detection thresholds of 30 and 90 are used to extract edge pixels for all testimages.

3.4.1 Synthetic Images

An old idiom suggests that one should learn to crawl before attempting to run. As such,experimental evaluation of fast symmetry begins with synthetic images. These images areless challenging than the kind of images a domestic robot will encounter. The results ofsymmetry detection on these synthetic images are shown in Figure 3.5. The symmetrylines detected using fast symmetry are drawn in green on top of the source image. Thedetection results are organized in columns based on the parameter Nlines, which controlsthe number of symmetry lines returned by detection. Note that the rightmost column ofthe figure contains detection results obtained using different Nlines values. Going fromtop to bottom, 4, 5 and 6 symmetry lines are detected respectively. Apart from Nlines,the same detection parameters are used for all images. The minimum pairing threshold isset to Dmin = 10 to prevent the detection of small scale symmetry that occur along thicklines, such as the triangle surrounding the poison symbol. No maximum pairing thresholdis used, nor any orientation limits.

Overall, the detection results in Figure 3.5 appear similar to the kind of bilateral symmetryperceived by the human visual system. The exceptions are the diagonal symmetry lines ofthe triangular poison symbol and the wide rectangle. In the poison symbol, located at thebottom of the Nlines = 3 column of Figure 3.5, the vertical symmetry line received moreHough votes than the two diagonal lines. The same is true for the horizontal symmetryline of the rectangle, which has more votes that the vertical and diagonal lines. WhileSection 3.2.3 warns against directly using the Hough accumulator vote count as an indica-tion of symmetry strength, for these synthetic figures, the votes cast for a symmetry lineseem to correlate with our human perception of symmetry strength.

3.4.2 Video Frames of Indoor Scenes

After convincing ourselves that the detection method is effective on synthetic images,the next step is to attempt symmetry detection on more difficult data. As the finalrobot system will operate on graspable objects imaged at arm’s reach, images of indoorscenes containing symmetric household objects are used as test data. The test imagesare 640 × 480 video images captured using a colour IEEE1394 camera. Due to aliasingcaused by hardware sub-sampling of the camera’s image, which has a native resolution of1280× 960, Gaussian blurring is applied to the test images prior to edge detection.

37


Nlines = 1 Nlines = 3 Nlines = 4,5,6

Figure 3.5: Symmetry detection results – Synthetic images.

As the assumption that a symmetric shape will occupy the majority of the image is nolonger valid, the distance thresholds are adjusted to improve detection robustness. First,Dmin is increased to 25 to limit the effects of small scale symmetry which is more prevalentin real world scenes due to noisy edge pixels. The upper threshold Dmax is set to 250 whichis roughly half the image width to reduce the probability of detecting non-object symmetry.In practice, given the location of the robot’s cameras, the width of objects within reachof the robot’s manipulator never exceeds 200 pixels.

Figure 3.6 shows the symmetry line detected by fast symmetry when it is applied to animage of a scene containing a single symmetric object. The detected symmetry line isdrawn in red. The edge pixels of the image, which are the detection method’s input data,are coloured black. Note that the edge pixels have been dilated to improve their visibility.The large quantity of non-object edge pixels suggests that fast symmetry is highly robustto asymmetric edge pixel noise.

38


(a) Input image


Figure 3.6: Symmetry detection result – Single symmetric object. The detected symmetry line isdrawn in red over the input image’s edge pixels, which are dilated and drawn in black.

39


Moving on to a scene of greater complexity, the detected symmetry lines of an imagecontaining three symmetric objects are shown in Figure 3.7. By setting Nlines = 3, thethree symmetry lines with the most Hough votes are returned by detection. Again, edgepixels have been dilated and are drawn in black. This scene has fewer background edgepixels than the last, the presence of which may cause non-object symmetries to be detectedahead of object symmetries. Note that only a single detection pass is needed to recoverall three symmetry lines. The detection result suggests that fast symmetry can operateon images with multiple symmetric objects effectively. Also, the green symmetry lineshows that fast symmetry can operate on roughly symmetric objects. In this case, themethod detected the mug’s bilateral symmetry while ignoring the asymmetric edge pixelscontributed by its handle.

Next, a failure scenario where non-object symmetries overshadow object symmetry isexamined. The threshold Dmin is decreased to 5 pixels and the Canny thresholds are alsoreduced to increase the number of noisy edge pixels contributing to Hough voting. Afterapplying these changes, Figure 3.8 shows the top five symmetry lines returned by fastsymmetry for a scene containing multiple symmetric objects. The numbers next to thesymmetry lines indicate their ranking in terms of Hough accumulator vote count, with onebeing the highest ranked and five the lowest.

While technically not a failure of fast symmetry, the detection of non-object symmetriesahead of symmetries emanating from objects is unwanted in many situations. Moreover,if bilateral symmetry is used as an object feature, this problem needs to be addressed.The middle and bottom images of Figure 3.8 can be used to examine the cause of thisproblem. Edge pixels that voted for a non-object symmetry line are coloured red. The highfrequency surface texture of the multi-colour mug contributes the majority of noisy edgepixels to the non-object symmetries. As fast symmetry does not know what an object is,the problem of detecting non-object symmetry lines can not be fully resolved. Therefore,higher level processes making use of symmetry detection results must be robust againstnon-object symmetry lines.

The above problem can be partially overcome through three approaches. Firstly, Dmin

can be increased to prevent the pairing of noisy edge pixels within high texture patches.Reverting to the original suggested value of 25 will reduce the strength of symmetry lines2 and 4. Secondly, increasing the Canny thresholds will reduce the edge pixel noise,especially from faint high frequency texture as seen on the multi-colour mug in Figure 3.8.However, increasing the Canny thresholds too much may result in missing edge pixelsin object contours that can cause missed detection of symmetric objects. Finally, theorientation range of detection can be limited if some prior knowledge of object pose isknown. This method is further detailed below.

40


(a) Input image


Figure 3.7: Symmetry detection result – Scene containing multiple symmetric objects. Thedetected symmetry lines are drawn over the input image’s edge pixels, which are dilated and drawnin black.

41


(a) Detection result

(b) Symmetry line 2

(c) Symmetry line 4

Figure 3.8: Detection of non-object symmetry lines due to edge pixel noise. Edge pixels havebeen dilated and are coloured red if they voted for symmetry lines 2 or 4.

42


In situations where some a priori information concerning an object’s orientation is known,the orientation range of detection can be restricted to reduce the impact of non-objectsymmetries. For example, objects with surfaces of revolution, such as cups and bottles,are expected to have near-vertical lines of symmetry when placed upright on a table. Byusing this knowledge, the θlower and θupper orientation limits can be adjusted to restrictthe range of orientations for which symmetries are detected.

Figure 3.9 provides an example where non-object symmetries are successfully rejectedby limiting detection orientation. In the case where no orientation limits are used, theobject symmetry line is ranked fifth in terms of Hough votes. This low ranking is dueto a combination of strong background symmetry and weak object symmetry caused bydisruptions in the object’s edge contour due to the experimenter’s hand. By restricting theorientation range of detection to ±15 degrees of vertical, the object’s symmetry line nowreceives the most votes. Reducing the orientation range also improves detection speed,which is further discussed in Section 3.4.3.

Finally, the detection method is tested on a set of visually challenging objects. Figure 3.10contains the fast symmetry detection results for these objects with Nlines = 1 so that onlyone symmetry line is returned. Also, the orientation range of detection is restricted to±25 degrees of vertical.

The multi-colour mug image poses several challenges. Firstly, the mug is slightly asymmet-ric due to its handle, which provides a test for the asymmetry robustness of fast symmetry.Secondly, the handle occludes the left side of the mug’s contour, reducing the quantityof symmetric edge pixels provided to fast symmetry as input. The reflective can has ashiny surface which may contain symmetries due to reflections of its surroundings. Thetextured bottle has a lot of edge noise within its object contour due to high frequencysurface texture. The transparent bottle has weak contrast with the background, whichresults in gaps along the object’s edge contour.

The successful detection of symmetry lines for these visually difficult objects illustrates therobustness of fast symmetry. The detection results also show that fast symmetry is highlygeneral, capable of dealing with objects of vastly different visual appearances. Anotherpoint of note is that fast symmetry can detect the symmetries of reflective and transparentobjects. Due to their unreliable surface pixel information, these objects are difficult todeal with using traditional computer vision approaches.

43


(a) No orientation limit

(b) ±15 degrees of vertical

Figure 3.9: Rejecting unwanted symmetry lines by limiting detection orientation range. Thedetected symmetry lines are drawn over the input image’s edge pixels, which are dilated and drawnin black.

44


(a) Multi-colour mug (b) Reflective can

(c) Textured bottle (d) Transparent bottle

Figure 3.10: Symmetry detection results – Objects with challenging visual characteristics.

45


3.4.3 Computational Performance

As many robotic applications demand rapid feature detection due to real time constraints,the computational cost of the proposed detection method must be examined. The effectsof input data size and changes in algorithm parameter values on computational cost areexamined using detection execution time as a metric. A set of eleven video frames ofindoor scenes, which includes the test images in Figures 3.6, 3.7, 3.8 and 3.9, are usedas test data. Note that these are the same test images used in previous timing trialsdocumented as Table I in the author’s IROS paper [Li et al., 2006].

In the new timing trials, the distance thresholds and edge detection parameters are thesame as those used to obtain the results in Section 3.4.2. The number of symmetry linesdetected is fixed by setting Nlines = 5. Detection is performed on the full orientationrange of 180 degrees. The detection execution time is averaged over 1000 trials. The testplatform is an Intel 1.73GHz Pentium M laptop PC. The detection method is coded usingC++ and compiled with the Intel C Compiler 9.1.

The results of the timing trials are recorded in Table 3.1. The execution times of thevoting and peak finding portions of the code are measured separately. The voting timeinclusively incorporates everything between line 1 and line 12 of Algorithm 1. The peakfinding time measures lines 13 to 15 of the algorithm, including the sub-pixel refinementstep described at the end of Section 3.2.2. Note that Canny edge detection, which takesaround 8ms to perform, is not included in the trial times. The time required for theextraction of edge locations from the edge image is included in the voting portion of theexecution times.

Table 3.1: Execution time of fast bilateral symmetry detection over 1000 trials

Image number Number of edge pixelsMean execution time (ms)Voting Peak find Total

1 6170 24.78 3.62 28.402 10444 61.75 3.72 65.483 6486 37.70 3.88 41.594 8700 52.20 4.22 56.425 9026 56.03 3.83 59.866 8365 48.38 3.24 51.637 5859 35.50 3.71 39.218 6350 35.44 4.24 39.689 7471 43.40 4.63 48.0310 8396 47.48 4.04 51.5111 3849 18.49 4.02 22.51

OVERALL MEAN 7374.18 41.92 3.92 45.85

A major point of note is the vast improvement in detection speed over the previous timingtrials conducted in 2006. The overall mean execution time has been reduced to a thirdof the previous mean of 150ms. This improvement in execution is especially significantconsidering that edge sampling was employed in the previous trials to reduces the quantity

46


of input data. The edge sampling process produced a randomly selected subset of theoriginal edge pixels, effectively reducing the input data size by a factor of four. Thismassive boost in computational performance can be attributed to a variety of codingimprovements such as pointer arithmetic and streamlining of mathematical computations.The use of aggressive optimization settings during compilation also decreased detectiontime significantly.

Section 3.3 hypothesized that reducing the orientation range will linearly improve thecomputational performance of detection. This hypothesis hinges on the assumption thatthe distribution of edge pixels across the rows of Rot remains fairly constant for differentrotation angles. To test the validity of this hypothesis and the accompanying assumption,the execution time of detection carried out with different orientation limits is measured.

The mean execution time is plotted against the orientation range of detection in Fig-ure 3.11. Each data point represents the mean execution time of eleven 1000-trial detec-tion runs across all the test images, carried out with a specific orientation range. Theleast squares fit of a linear model to the timing data is shown as a dashed line on theplot. The highly linear relationship between execution time and orientation range provesthe hypothesis, thereby also confirming the underlying assumption of similar edge pixeldistribution across different rotation angles. The timing results have also experimentallyvalidated the prior assertion that using a ±10 degree orientation limit will lower detectiontime by a factor of nine. Note that when using narrow orientation limits, the peakingfinding process may occupy a fairly large portion of the overall detection time.

Mean Execution time of Voting versus Orientation Range of

Detection

y = 0.2277x + 1.1086

R2 = 0.9998

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 160.00 180.00

Orientation Range (Degrees)

Tim

e (

ms

)

Figure 3.11: Plot of mean execution time of voting versus detection orientation range.

The timing trial results clearly show that fast symmetry is well suited to time-criticalapplications, which are abundant in robotics. The overall mean execution time suggeststhat detection can be carried out at 20 frames per second, and higher frame rates can

47


be achieved by reducing the orientation range of detection. Apart from limiting orienta-tion range, choosing reasonable Canny thresholds and increasing Dmin provides the mostnoticeable performance gains in practice. The former can reduce the number of inputedge pixels, which reduces the quantity of input data supplied to detection. The latterlimits the number of edge pairs formed, thereby reducing the number of voting operationsperformed.

3.5 Comparison with Generalized Symmetry Transform

3.5.1 Introduction

The generalized symmetry transform [Reisfeld et al., 1995], herein referred to as generalizedsymmetry, is a popular computer vision approach for detecting symmetry in images. Thismethod was designed as a context free attentional operator that can find regions of interestwithout high level knowledge about the image. Apart from detecting bilateral symmetry,this method can also find skew and radial symmetry. No object segmentation or recognitionis required prior to detection. Two symmetry maps are generated by the transform, onerepresenting the magnitude of symmetry, the other phase. Of the two symmetry mapsgenerated, the magnitude map is of particular interest. Performing line detection on thebinary result produced by thresholding the magnitude symmetry map will yield similarsymmetry lines to those detected using fast symmetry.

The magnitude symmetry map is an image with pixel values representing the symmetryof the input image at a particular location. The value at each pixel of the map is found bysumming the symmetry contributions from all possible pairs of pixels in the input image.The symmetry contribution of each pixel pair is calculated by multiplying several weights,each of which represents a different property of symmetry. An adjustable distance weight,containing a Gaussian function whose width depends on a parameter σ, controls the scaleof detection. Mirror symmetry in the orientation of local gradients between a pair of pixelsis enforced by a phase-based weighting function. The weighting scheme also includes thelogarithm of gradient intensity, which favours shapes with strong contrast against theirbackground.

3.5.2 Comparison of Detection Results

Using two synthetic images, the detection results of fast symmetry is qualitatively com-pared with those of the generalized symmetry transform. The test images are 101 × 101pixels in size so that the shape’s symmetry lines do not lie between pixels. The detectionresults are displayed in Figure 3.12 and Figure 3.13. In the upper left image, the toptwo symmetry lines detected using fast symmetry are visualized as solid red lines. Theedge pixels found using Canny edge detection are shown as green pixels. The bottom two

48

Section 3.5. Comparison with Generalized Symmetry Transform

images in the figures are magnitude symmetry maps generated using generalized symme-try with different σ values. Recall that the σ parameter gives control over the scale ofdetection, with lower values favouring small scale symmetry. No distance thresholds areused for either algorithm. Additional generalized symmetry transform results, including asymmetry map generated from an image of an indoor scene, can be found in the author’sIJRR paper [Li et al., 2008].

(a) Input image (b) Fast symmetry Nlines = 2

(c) Generalized symmetry σ = 2 (d) Generalized symmetry σ = 200

Figure 3.12: Test image 1 – Dark rectangle over uniform light background.

Beginning the analysis with Figure 3.12, several conclusions can be drawn from the de-tection results. Firstly, comparing Figures 3.12(b) and 3.12(d), the bilateral symmetriesfound using fast symmetry is similar to the large scale symmetry map returned by gen-eralized symmetry with σ = 200. This result agrees with theoretical expectations as fastsymmetry is designed to find global, object-like symmetries, especially when no upper dis-tance threshold is applied. By decreasing σ, contributions from large scale symmetries arereduced. Figure 3.12(c) shows small scale symmetry detection results using generalized

49


symmetry. The corners of the rectangular shape are highlighted in the symmetry mapbecause of they have strong local symmetry. These symmetric local features are useful forapplications such as eye-detection and object recognition.

Binary thresholding followed by Hough transform can be applied in series to obtain sym-metry lines from the symmetry map in Figure 3.12(d), similar to those detected usingfast symmetry. However, as the horizontal line in the symmetry map has much lowerintensity than the vertical one, it is difficult to perform thresholding automatically. Thislarge difference in symmetry intensity is due primarily to the distance weighting functionin generalized symmetry. The distance function is a Gaussian centered at distance zero,which biases detection towards small scale symmetries. Increasing the σ value spreads theGaussian, reducing this favouritism, but never entirely reversing it. As such, the general-ized symmetry method relies on the large numbers of contributing pixels present in largescale symmetries to overcome this inherent bias in the distance weighting.

The generalized symmetry transform is designed to detect solid uniformly shaded shapesagainst a contrasting background of an opposing intensity. The test image in Figure3.13 violates this assumption. The fast symmetry detection results show that the rect-angle’s main symmetry lines are found despite the edge noise caused by the variation inbackground intensity. However, the symmetry maps show that the violation of general-ized symmetry’s foreground-background assumption noticeably deteriorates its detectionresults.

In Figure 3.13(c), the small scale symmetries of the rectangle’s corners are recovered,which shows that the local feature detection behaviour is still functional. However, twohorizontal lines above and below the rectangle are also present in the symmetry map.These lines are due to local gradient symmetry between pixels paired along the horizontalsides of the rectangle. Because of the changing background intensity, the dark-to-lightgradient orientation points inward on the left side of the rectangle but points outwardon the right side. The orientation reverses direction about a vertical axis bisecting therectangle, passing through the top and bottom sides of the rectangle where the intensityinside and outside the shape is the same. As such, pixels at the top and bottom sideswhich are equidistant across this vertical axis where the gradient orientation flips from into out will appear symmetric to generalized symmetry.

Moving on to a larger scale of detection, Figure 3.13(d) shows that the rectangle’s sym-metry is no longer found in the symmetry map. While a horizontal band can be seen, thelocations occupied by the symmetry lines detected by fast symmetry are not brightly lit inthe symmetry map. Due to the orientation flip of local gradients around the border of therectangle described in the last paragraph, the gradient orientations of pixels on the leftand right sides of the rectangle are parallel, facing the same direction. The phase weight-ing function of generalized symmetry increases the symmetry contributions of pixel pairsthat have local gradient orientations that mirror each other. As local gradients on the leftand right border of the rectangle no longer mirror each other, the vertical symmetry haseffectively been destroyed in the eyes of generalized symmetry.

50

Section 3.5. Comparison with Generalized Symmetry Transform

(a) Input image (b) Fast symmetry Nlines = 2

(c) Generalized symmetry σ = 2 (d) Generalized symmetry σ = 200

Figure 3.13: Test image 2 – Grey rectangle over background with non-uniform intensity.

These comparisons of detection results are not quantitative. Instead, the comparisons havequalitatively highlighted the main differences between the generalized symmetry transformand fast symmetry. Generalized symmetry is designed for computer vision applicationswhere objects and background obey a dark-on-light or light-on-dark intensity contrast.As fast symmetry is designed to tolerate non-uniform background intensity, it does notsuffer from the violation of this assumption. Generalized symmetry is capable of findinglocal features using a tunable scale parameter whereas fast symmetry cannot. Generalizedsymmetry also relies on local gradient intensity to determine symmetry strength. Echoingthe comments found in the introduction of Kovesi’s paper on phase-based symmetry detec-tion [Kovesi, 1997], the use of gradient intensity will bias detection towards high contrastshapes, which may not be more symmetric than a low contrast shape. Assuming the lowcontrast shape generates equal numbers of edge pixels, fast symmetry does not suffer fromthis problem.

51


Overall, the differences between the two methods stem from their assumptions and under-lying motivations. Generalized symmetry is designed to be a highly general context-freemethod to locate local features by leveraging bilateral symmetry and radial symmetry.Symmetric shapes in the input image must have high contrast against a uniform back-ground. As fast symmetry is designed to recognize symmetric objects in real world imagesquickly and robustly, fewer assumptions are made with regards to a scene’s foregroundand background. As a trade off, fast symmetry does not detect local features. Also, fastsymmetry does not return the same quantity of information as a symmetry map. Thesymmetry map contains dense symmetry information that may be useful for applicationssuch as image signature generation. On the other hand, as confirmed by the comparisonresults, fast symmetry can detect symmetry that eludes generalized symmetry. The fastbilateral symmetry detection method also operates much faster than generalized symme-try. The computational costs of fast symmetry and generalized symmetry are furtherexamined in the next section.

3.5.3 Comparison of Computational Performance

The computational complexities of both algorithms are O(N2), where N is the input datasize. However, for the same input image, the size of N is vastly different for each algorithm.Whereas the generalized symmetry transform operates on every image pixel, fast symmetryonly operates on the edge pixels of the input image. The practical complexity of generalizedsymmetry can be reduced to O(Nr2) by limiting the pixel pairing distance to 2r. Theparameter r is set so that any pixel pair with nearly zero symmetry contribution due toa large distance of separation is never paired. Given that r <

√N , computational cost

is reduced. The r parameter is similar to the fast symmetry algorithm’s upper distancethreshold Dmax, which also limits pairing distance. To maintain fairness during the timingtrials, no distance thresholds are used for either algorithm.

The results of the timing trials are shown in Table 3.2 and Table 3.3. The two test imagesin Figures 3.12 and 3.13 are used as input data in these timing trials. Both algorithms are

Table 3.2: Execution time of generalized symmetry transform on test images over 10 trialsImage number Number of image pixels Mean execution time (ms)

1 10201 235052 10201 24489

OVERALL MEAN 10201 23997

Table 3.3: Execution time of fast symmetry detection on test images over 1000 trials

Image number Edge pixel countMean execution time (ms)

Canny Voting Peak find Total1 280 0.555 1.447 0.673 2.6742 363 0.565 2.052 0.672 3.289

OVERALL MEAN 321.5 0.560 1.750 0.673 2.982

52

Section 3.6. Chapter Summary

implemented using C++ and complied with the Intel C Compiler 9.1. The test platformis an Intel 1.73GHz Pentium M laptop PC.

Comparing the mean execution times, fast symmetry is shown to operate over 8000 timesfaster than the generalized symmetry method. The large difference in detection time canbe attributed to two main factors. Firstly, fast symmetry reduces input data size by theuse of Canny edge detection prior to voting. For the test images, this reduced the numberof input pixels from over ten thousand to around 300. For more cluttered images, theedge pixel to image pixel ratio is generally less favourable, but still significantly reducesexecution time. Secondly, the voting operation of fast symmetry is much more efficient tocompute than the weight-based contribution calculations of generalized symmetry. Thesimple voting procedure used in fast symmetry, as well as the edge pixel rotation andgrouping step prior to voting, greatly reduces the computational cost in the O(N2) portionof the algorithm.

The low computational cost of the fast symmetry voting loop can be confirmed by com-paring the execution times of both methods when operating on similar quantities of inputdata. Recall that in Section 3.4.3, the execution times of fast symmetry on 640 × 480images were measured over many trials. Referring back to the results in Table 3.1, Cannyedge detection on test image 2 produced 10444 edge pixels, all of which are processed byfast symmetry. Fast symmetry required a mean execution time of 65.5ms, with Cannyedge detection requiring another 8ms. As 101 × 101 test images are used for the currentexecution time comparison, 10201 image pixels are processed by generalized symmetry.Table 3.2 shows that generalized symmetry took over 23 seconds to generate its result.The massive difference in execution time for similar quantities of input data confirms thelow computational cost of the fast symmetry voting method when compared with thesymmetry contribution calculations of generalized symmetry.

3.6 Chapter Summary

This chapter introduced the fast bilateral symmetry detection method. This method canrapidly and robustly detect symmetry lines in large images. It can operate directly onvideo frames of indoor scenes without requiring any manual preprocessing. The bilateralsymmetry of near-symmetric objects, such as a mug with a handle, can be detected usingfast symmetry. Leveraging the noise robustness of Canny edge detection and Houghtransform, fast symmetry has been shown to execute robustly at 20 frames per second on640× 480 images. The experimental results also show that fast symmetry can operate onvisually difficult objects including those with transparent and reflective surfaces.

The tunable parameters in fast symmetry allow for great flexibility in detection withoutsignificantly affecting detection behaviour. For example, distance thresholds can be usedto lower the impact of inter-object symmetry. Unlike the σ scale parameter in generalizedsymmetry, the distance thresholds act like a bandpass filter, capable of limiting symmetry

53


detection to a range of object widths. The execution time of detection can be dramaticallyreduced by setting limits on the orientation range of detection.

On the other hand, this chapter also highlighted the limitations of fast symmetry. Dueto the lack of high level knowledge prior to detection, fast symmetry will detect bothobject and non-object symmetries. Adjusting the distance thresholds can help remedythis problem. Higher level processes, such as Kalman filtering, should also be equippedwith the means to disambiguate the origins of detected symmetry lines. Also, a prioriexpectations of an object’s orientation can be used to limit the angular range of detectionto improve robustness and speed.

The global nature of fast symmetry was compared against the multi-scale nature of thegeneralized symmetry transform. The comparison showed that fast symmetry is able tofind symmetry lines in images that confuse generalized symmetry. Execution times of thetwo methods showed that fast symmetry deserves its titular adjective fast, performingsymmetry detection thousands of times faster than generalized symmetry. However, thesymmetry lines returned by fast symmetry carry less information than the symmetry mapsproduced using generalized symmetry, which contains spatial symmetry strengths and localattentional features. Overall, fast symmetry seems more suited to robotic applications,where detection speed and robustness to non-uniform lighting are crucial to operationalsuccess.

As mentioned at the beginning of this chapter, real time detection is a major motivationbehind the development of fast symmetry. The C++ implementation of the fast symme-try algorithm was the first bilateral symmetry detector to practically demonstrate realtime performance on high resolution videos of real world scenes. Since its development,fast symmetry has been applied to a variety of time-critical applications. The results ofapplying fast symmetry to the problems of object segmentation, stereo triangulation andreal time object tracking are presented in upcoming chapters.

54

Our notion of symmetry is derived from the humanface. Hence, we demand symmetry vertically and inbreadth only, not horizontally nor in depth

Blaise Pascal 4Sensing Objects in Static Scenes

4.1 Introduction

Chapter 3 introduced the fast bilateral symmetry detection method. This symmetry de-tection method is the backbone of the robot’s vision system, providing low-level symmetryfeatures to other vision methods. This chapter details two methods that make use of de-tected symmetry to segment and triangulate objects in static scenes. These higher levelmethods are model-free, requiring no offline training prior to operation. Both methodsrely solely on edge information, which is visually orthogonal to colour and intensity in-formation. As such, other vision approaches using orthogonal visual information can beapplied synergetically with the methods presented in this chapter.

The proposed object segmentation method automatically segments the contours of bilater-ally symmetric objects from a single input image. The contour segmentation is performedusing a dynamic programming approach. The segmentation method is able to operate onnew objects as it does not rely on existing models of target objects. The segmentationmethod has low computational cost, operating at 20 frames per second on 640×480 inputimages. The segmentation research was carried out in collaboration with Alan. M Zhang,who designed and implemented the dynamic programming portion of the segmentationmethod. The segmentation research, available in press as [Li et al., 2006] and [Li et al.,2008], is detailed in Section 4.2.

The triangulation method uses stereo vision to localize object symmetry lines in threedimensions. By detecting symmetry in images obtained using a calibrated stereo visionsystem, pairs of symmetry lines are triangulated to form symmetry axes. These threedimensional symmetry axes are especially useful when dealing with surfaces of revolution.The symmetry triangulation method does not assume reliable surface pixel informationacross stereo views. This allows the method to triangulate objects that are difficult tohandle with traditional approaches, such as those with transparent and reflective surfaces.The symmetry triangulation method is available in press as [Li and Kleeman, 2006a]. Thetriangulation approach is detailed in Section 4.3.

55

CHAPTER 4. SENSING OBJECTS IN STATIC SCENES

4.2 Monocular Object Segmentation Using Symmetry

4.2.1 Introduction

Object segmentation methods attempt to divide data into subsets representing usefulphysical entities. Methods using monocular vision try to group spatially coherent imagepixels into objects. In terms of prior knowledge requirements, object segmentation liesbetween object recognition and image segmentation. Object recognition methods requireobject models to find instances of objects in the input image. These object models areacquired offline prior to recognition, usually from training conducted on positive andnegative images of target objects. For example, multiple boosted Haar cascades [Viola andJones, 2001], individually trained for each target object, can be used to robustly detect andrecognize objects in a multi-scale manner. Image segmentation is on the opposite end of theprior knowledge spectrum. It uses context-independent similarity between image pixels,such as colour and intensity, to segment image regions. Image segmentation methods aresurveyed in [Pal and Pal, 1993] and [Skarbek and Koschan, 1994], the latter focusing onapproaches using colour. Object segmentation provides useful mid-level information aboutan image, bridging the gap between low level and high level knowledge.

To improve the generality and flexibility of segmentation, the proposed approach assumesminimal background knowledge, leaning towards image segmentation methods. Similarto the generalized Hough transform [Ballard, 1981], the core assumption is that targetobjects all share a common parameterizable feature. For the case of the generalized Houghtransform, the parameterizable feature may be straight lines, circles or more complicatedshapes. Bilateral symmetry is the main feature used to guide segmentation in the proposedapproach. This allows a robot using the symmetry-based segmentation method to operatewithout relying on object models. In fact, the segmentation results can potentially beused as training data to build models for new objects.

Symmetry-aided segmentation has been investigated in [Gupta et al., 2005]. Their methoduses symmetry to augment the affinity matrix in a normalized cuts segmentation. Nor-malized cuts produces accurate segmentations but has a high computational cost. As theirapproach assumes symmetry of pixel values within an object’s contour, the segmentationof transparent objects and objects with asymmetric texture is impossible. Also, their ap-proach is not robust against strong shadows or specular reflections commonly found in realworld images due to non-uniform lighting. The drawbacks of requiring symmetry betweenlocal image gradients are similar to those previously encountered with the generalizedsymmetry transform in Section 3.5.2.

Object segmentation is defined as the task of finding contours in an input image belongingto objects in the physical world. This definition implicitly removes the aforementionedassumption of symmetric image gradients, an assumption which is problematic in realworld images. An object contour is defined as the most continuous and symmetric contourabout a detected symmetry line. This definition of the object segmentation allows a robust

56

Section 4.2. Monocular Object Segmentation Using Symmetry

and low computational cost solution to the segmentation problem at the expense of limitingsegmentation targets to those with visual symmetry.

Object contours can be detected from an image’s edge pixels. However, simply identi-fying all edge pixels that voted for a detected symmetry line will produce broken con-tours that may include noisy edge pixel pairs. As such, a more robust contour detectionmethod is required. Contour detection has a wide literature base in computer vision andmedical imaging. Existing methods general make use of energy-minimizing snakes [Yanand Kassim, 2004] or dynamic programming [Lee et al., 2001; Mortensen et al., 1992;Yu and Luo, 2002]. The proposed approach departs from the norm by removing the needfor manual initialization, such as the specification of control points or hand-drawn curves.This added level of autonomy in segmentation is achieved by using dynamic programmingin combination with weights based on the bilateral symmetry of edge pixel pairs.

A single pass technique is used so that the segmentation method maintains a stable andpredictable execution time. The segmentation method consists of three steps. Firstly, apreprocessing step, detailed in Section 4.2.2, rejects asymmetric edge pairs and weightsthe remaining edge pairs according to their level of symmetry. This is followed by adynamic programming step to produce a continuous contour. Finally, this contour isrefined, allowing for slight asymmetry, so that the contour passes over the object’s edgepixels. The latter two steps are detailed in Section 4.2.3.

4.2.2 The Symmetric Edge Pair Transform

The symmetric edge pair transform, herein referred to as SEPT, is a preprocessing stepapplied to the input edge pixels prior to dynamic programming contour detection. To savecomputational cost, the edge pixels used by fast symmetry detection are reused as inputdata. The edge image is rotated so that the object’s symmetry line becomes vertical priorto applying SEPT. The SEPT is detailed in Algorithm 2.

The SEPT performs three main functions. Firstly, it removes edge pixel pairs that areunlikely to belong to an object’s contour. Secondly, the remaining roughly symmetric edgepixel pairs are transformed based on their level of symmetry, parameterizing them into theSeptBuf buffer. A dynamic programming step then operates on this buffer. Thirdly, andquite subtly, the SEPT also resolves ambiguities in overlapping edge pixel pairs. Thesemajor functions are detailed below.

Rejecting Edge Pixel Pairs

Two criteria are used to reject edge pixels after pairing. Firstly, edge pixel pairs thatare too far apart are removed. This is performed on line 6 of Algorithm 2 by comparingthe pairing distance against a threshold MAXw. Recall that the fast symmetry detectionmethod, detailed in Algorithm 1, has a threshold Dmax governing the maximum width

57


Algorithm 2: Symmetric edge pair transform (SEPT)Input:E – Edge imageSym – Object symmetry line x-coordinateOutput:SeptBuf – SEPT bufferParameters:MAXw – Maximum expected width of symmetric objectsMAXmid – Maximum distance between midpoint of edge pair and SymW (d) = 1− d

2(MAXmid)

SeptBuf [][]← −11

foreach row r in E do2

foreach edge pixel pair in r do3

p← x-coordinates of edge pixel pair4

w ← distance between pixels in p5

if w < MAXw then6

SKIP TO NEXT PAIR7

d← distance between the midpoint of p and Sym8

if d < MAXmid then9

if SeptBuf [r][CEIL(w/2)] < W (d) then10

SeptBuf [r][CEIL(w/2)]←W (d)11

of edge pairs. Edge pixel pairs wider than Dmax do not contribute any votes to detectedsymmetry. As such, MAXw is usually set to Dmax to prevent the inclusion of non-objectedge pixels in the final contour.

Secondly, edge pixel pairs that are not roughly symmetric about an object’s symmetryline are removed. This is done by comparing the deviation between their midpoints andthe symmetry line against a threshold MAXmid. This threshold is in the order of severalpixels so that small deviations in the object contour from perfect symmetry are toleratedby SEPT.

Edge Pixel Weighting and SEPT Buffer

Apart from using the midpoint deviation d to reject asymmetric edge pixels, this deviationvalue is also used to calculate the symmetry weight of the remaining edge pixel pairs. Theweighting function W (•) is monotonically decreasing so that large deviations from perfectsymmetry result in low weights. After calculating the weight of an edge pixel pair, it isplaced into the SEPT buffer SeptBuf at indices (r, w/2), as described by lines 8 to 11of the algorithm. The vertical coordinate, r, is simply the current row the algorithm isoperating on. The horizontal coordinate is taken as the half-width of an edge pixel pair,rounded towards the higher integer using a ceiling function CEIL().

58


Figure 4.1(b) is an image visualization of the SEPT buffer. The floating-point weightsin SeptBuf have been converted to pixel intensities. Buffer cells with strong symmetryweights are given bright pixel values. The reverse is true for low symmetry cells, whichare assigned dark values. Buffer cells with −1 weight, indicating edge pixel pair rejection,are coloured black. To improve visibility, only the portion of the SEPT buffer containingthe object is shown. Note that there remains many edge pixel pairs that do not belongto the bottle’s object contour. The object contour returned by dynamic programming isshown in Figure 4.1(c).

Resolving Ambiguity of Overlapping Edge Pixel Pairs

The monotonically decreasing weighting function W (•) serves another purpose. Due tothe asymmetry MAXmid allowed in d, multiple edge pixel pairs with equal separationdistance will all parameterize into the same SEPT buffer cell. This ambiguity is bestillustrated using a simple numerical example. Lets assume the object’s symmetry line hasa x-coordinate of Sym = 5. Also, two edge pixel pairs have x-coordinates of p0 = (2, 8)and p1 = (3, 9). Notice that both edge pixel pairs are separated by 6 pixels, which meansthat both pairs will attempt to place their weights into the same cell of the SEPT buffer.This ambiguity is resolved by only keeping the larger weight. This will favour the edgepixel pair with minimum midpoint to symmetry line deviation. In this example, p0 hasa deviation d of 8+2

2 − 5 = 0 and for p1, 3+92 − 5 = 1. Therefore, the final value in the

SEPT buffer will be the weight calculated using edge pixel pair p0 as it is more symmetric.Algorithmically, this is performed by the if statement on line 10.

4.2.3 Dynamic Programming and Contour Refinement

The object contour is extracted from the SEPT buffer using a dynamic programming (DP)method. Using the SEPT buffer as input, the dynamic programming algorithm generatesa table of contour continuity scores. The table is the same size as the SEPT buffer. Highscoring cells of the table indicate high continuity in the SEPT buffer. As the SEPT buffer isbasically a symmetry-weighted edge image, detecting the most continuous contour withinthe DP score table also implicitly enforces contour symmetry. Note that this approachdiffers from traditional DP convention as it uses a table of rewards instead of costs.

The details of the DP method are presented in Algorithm 3. Step 1 of the algorithmcalculates the score of a current cell from the cells in the row above it by scanning the SEPTbuffer vertically. Allowing for 8-connected contours, the maximum vertical continuityscore across three neighbour cells is retained. Step 2 of the algorithm performs the same8-connected scan horizontally, calculating horizontal continuity scores from left to right.Step 3 is a repeat of Step 2 but moving in the opposite direction.

59


(a) Input image and detected symmetry

(b) SEPT buffer (c) DP contour

Figure 4.1: Overview of object segmentation steps. The top image shows the symmetry linereturned by the fast symmetry detector. In the SEPT buffer image, higher weights are visualized asbrighter pixel values. The object contour extracted by dynamic programming (DP) has been dilatedand is shown in red.

60


Algorithm 3: Score table generation through dynamic programmingInput:SeptBuf – SEPT bufferOutput:sTab – Table of continuity scores (same size as SeptBuf)backP trV – Back pointer along vertical directionbackP trH – Back pointer along horizontal directionParameters:MAXw – Maximum expected width of symmetric objects{Pver, Rver} – Penalty and reward for vertical continuity{Phor, Rhor} – Penalty and reward for horizontal continuity

sTab[ ][ ]← 01

for r ← 1 to ImageHeight do2

STEP 1: Vertical Continuity3

for c← 1 to MAXw2 do4

if SeptBuf [r][c] is not −1 then5

cost← SeptBuf [r][c] ∗Rver6

else7

cost← Pver8

vScore[c]←MAX

0sTab[r − 1][c− 1] + costsTab[r − 1][c] + costsTab[r − 1][c+ 1] + cost9

if vScore[c] > 0 then10

backP trV [r][c]← index of cell with max score11

backP trH[r][c]← backPtrV [r][c]12

STEP 2: Horizontal Continuity - Left to Right13

prevScore← −∞14

for c← 1 to MAXw2 do15

if SeptBuf [r][c] is not −1 then16

cost← SeptBuf [r][c] ∗Rhor17

else18

cost← Phor19

hScore← prevScore+ cost20

if vScore[c] ≥ hScore then21

prevScore← vScore[c]22

columnPtr ← c23

else24

prevScore← hScore25

if prevScore > sTab[r][c] then26

sTab[r][c]← prevScore27

backP trV [r][c]← {r, columnPtr}28

STEP 3: Horizontal Continuity - Right to Left29

Repeat STEP 2 for loop, with c← MAXw2 to 130

61


Only the highest continuity score from all three steps is recorded in the score table. Theneighbour cell contributing the maximum continuity is recorded in the backP trV array.Due to multiple horizontal scans, it is possible for cycles to form within the rows ofbackP trV . This is resolved in the algorithm by making a copy of backP trV prior to thehorizontal scans. This horizontal continuity pointer is used during backtracking whentravelling along horizontal portions of the contour.

Recall that in Figure 3.4 and its accompanying discussion, it was suggested that humansgenerally consider bilaterally symmetric objects as those with contours roughly parallelto the line of symmetry. To steer contour detection towards object-like contours, thehorizontal continuity reward is lower than the vertical reward in order to discourage wideand flat contours. A lower horizontal reward also prevents high frequency zigzag patternsin the final contour.

After generating the score table, the most continuous object contour is found by backtrack-ing from the highest scoring cell to the lowest. The details of the backtracking method aredescribed in Algorithm 4. An example of a contour produced by backtracking is shown inthe left image of Figure 4.2. Notice that the segmentation approach is able to generatea reasonable object contour in spite of the occlusion caused by the human hand coveringthe bottle. Due to the tolerance for asymmetries introduced in the SEPT step, the DPcontour produced by backtracking does not correspond exactly to the locations of theinput edge pixels. For rough segmentations, the contour obtained thus far is sufficient.However, to improve the accuracy of segmentation, refinement is performed to minutelyshift the contour onto actual edge pixels. The refinement step snaps the contour to thenearest edge pixel within a distance threshold. The threshold is set to MAXmid, whichgoverns the level of allowable asymmetry during the SEPT step. Note that the refinedcontour, unlike the original, is allowed to have small asymmetries between its left and rightportions. The results of contour refinement is shown in the right image of Figure 4.2.

Algorithm 4: Contour detection by backtrackingInput:sTab, backPtrV , backP trH – Output of algorithm 3Output:{Rc, Cc} – {Row, Column} indices of contour

{Rc, Cc} ← indices of MAX(sTab)1

while sTab[r][c] 6= 0 do2

{r, c} ← {Rc, Cc}3

{Rc, Cc} ← backP trV [r][c]4

if Rc = r then5

{Rc, Cc} ← backP trH[r][c]6

62


Figure 4.2: Object contour detection and contour refinement. The object outline returned bydynamic programming backtracking is shown on the left. The refined contour is shown on theright.

4.2.4 Segmentation Results

Figure 4.3 shows the segmentation result of a multi-colour mug. The symmetry line ofthe object is detected using the fast symmetry detector. The entire segmentation processis carried out automatically without any human intervention. The segmentation resultdemonstrates the method’s robustness against non-uniform object colour. Note that theedge image is quite noisy due to texture on the mug surface and high contrast text on thebook. This edge noise does not have any noticeable impact on the quality of segmentation.However, due to the symmetry constraint placed on the segmentation method, the objectcontour does not include the mug’s handle.

63


(a) Input image

(b) Object contour

Figure 4.3: Segmentation of a multi-colour mug. The refined object contour has been dilatedand is shown in purple overlayed on top of the edge image. The object’s detected symmetry line isshown in yellow.

64


Next, the segmentation method is tested against a scene with multiple symmetric objects.Reusing the test image in Figure 3.8, the segmentation method is applied to symmetrylines 1, 3 and 5. The results of the segmentation is shown in Figure 4.4. Notice thatall three objects are segmented successfully. However, the contour of the multi-colourmug is smaller than expected, containing the mug’s opening only. This is due to shadowsand specular reflections, which causes large gaps and asymmetric distortions in the edgecontour of the mug. Therefore, the highly symmetric and continuous elliptical contourof the mug’s opening is returned by the segmentation method. Note also that there is aslight distortion in the mug’s elliptical contour. This is caused by gaps in the outer rimedge contour of the mug’s opening. These gaps reduced the continuity of the outer rimcontour, causing the object contour to include some portions of the inner rim.

Figure 4.4: Object segmentation on a scene with multiple objects. The images in the peripheryeach show the detected contour overlayed on top of the edge image. The contours are rotated sothat the object’s symmetry line is vertical.

4.2.5 Computational Performance

The object segmentation software is implemented using C++. The experimental platformis a desktop PC with an Intel Xeon 2.2GHz CPU. No processor-specific optimizations

65


such as MMX and SSE2 are used. No distance thresholds are applied to the segmentationmethod. The maximum expected object width, MAXw, is set to the image width. Thesetrials were conduced prior to obtaining a copy of the Intel C compiler. As such, unlike thetiming trials performed in Section 3.4.3, aggressive optimization settings were not usedduring compilation of the segmentation source code.

Table 4.1 contains execution times of the segmentation method operating on 640 × 480images. Note that the test images are the same as those used in the fast symmetrydetection time trials recorded in Table 3.1. The execution times include the contourrefinement step, which takes around 6ms on average. Fast symmetry detection executiontimes are not included. The values in the column titled Number of edge pixel pairs arethe quantity of edge pixel pairs formed during SEPT processing. Given that furtheroptimizations are possible during compilation and the MAXw distance threshold can belowered, the mean execution time suggests that the segmentation method can operatecomfortably at 30 frames per second.

Table 4.1: Execution time of object segmentationImage number Number of edge pixel pairs Execution time (ms)

1 77983 302 142137 293 65479 224 68970 255 67426 436 44901 447 90104 328 133784 329 121725 4810 177077 3911 51475 38

OVERALL MEAN 94642 35

4.2.6 Discussion

The proposed segmentation method is a mid-level approach to obtain additional informa-tion about symmetric objects. As the method does not require any object models priorto operation, new symmetric objects can be segmented. This may allow the use of ob-ject contours to gather training data so that reoccurring objects can be recognized. Theproposed method can segment multiple objects from a scene, assuming background sym-metry lines are rejected prior to segmentation. It may also be possible to use the resultsof segmentation to reject background symmetries by examining the length and width ofdetected contours. The fast execution time of the segmentation method suggests that itis well suited for time-critical situations.

66

Section 4.3. Stereo Triangulation of Symmetric Objects

The lack of prior information about objects is a double-edge sword. A recognition-basedsystem trained on hand-labelled and manually segmented images will out perform the pro-posed method in terms of segmentation accuracy and robustness. However, the proposedmodel-free approach frees a robotic system from mandatory offline training, which is verytime consuming for large object databases. Also, exhaustive modelling of all objects isimpractical for real world environments such as the household, where new objects mayappear sporadically without warning.

Asymmetric objects cannot be segmented using the proposed approach. This is accept-able as visually orthogonal segmentation cues such as colour and shape can be used syn-ergetically with symmetry. However, asymmetric portions of roughly symmetric objects,such as cup handles, are also excluded from the final contour. Also, the disambiguationof object and background symmetries remain unresolved. Additional visual informationand background knowledge are required to solve these problems robustly. In Chapter 6,both problems are resolved by the careful and strategic application of autonomous roboticmanipulation to simultaneously reject background symmetries and segment objects. Byactively moving objects to perform segmentation, asymmetric portions of near-symmetricobjects are included in the segmentation results.

4.3 Stereo Triangulation of Symmetric Objects

4.3.1 Introduction

A plethora of stereo algorithms have been developed in the domain of computer vision.Despite their many differences, the algorithmic process of stereo methods designed toobtain three dimensional information can be broadly generalized into the following steps.Firstly, the intrinsic and extrinsic parameters of the stereo camera pair are found througha calibration step. Note that some stereo approaches do not require calibration or obtainthe camera calibration online during stereo triangulation.

After calibration, the next stage of most stereo algorithms can be described as correspon-dence. This stage tries to pair portions of the left and right images that belong to the samephysical surface in a scene. The 3D surface location is usually assumed to be a Lambertiansurface that appears similar in both camera images. Once corresponding portions havebeen found, their distance from the camera can be triangulated using the intrinsic andextrinsic parameters.

Sparse or feature-based stereo approaches, more commonly used in wide baseline anduncalibrated systems, triangulate feature points to obtain 3D information. Recent sparsestereo approaches generally make use of affine invariant features such as maximally-stableextrema regions (MSER) [Matas et al., 2002]. Dense stereo methods attempt to findcorrespondences for every pixel. Local patches along epipolar lines are matched betweenthe left and right camera images using a variety of distance metrics. Common matching

67


metrics include sum of squared differences (SSD) and normalized cross correlation (NCC).Depending on the time available for processing, dense stereo approaches may also utilize avariety of optimization methods to improve the pixel correspondences. Global optimizationapproaches make use of algorithms such as dynamic programming or graph cuts to optimizeacross multiple pixels and across multiple epipolar lines. Dense stereo approaches aresurveyed in [Scharstein and Szeliski, 2001; Brown et al., 2003].

The proposed stereo triangulation approach uses bilateral symmetry found using the fastsymmetry detector. As symmetry is a structural feature, not a surface feature, the infor-mation returned by the proposed stereo method is different from existing approaches. Incontrast to the surface 3D information returned by dense and sparse stereo methods, theproposed method returns a symmetry axis passing through the interior of the object. Bylooking for intersections between a symmetry axis and a known table plane, bilaterallysymmetric objects can be localized in three dimensions. Also, the symmetry axis can beused to bootstrap model-fitting algorithms by providing information about an object’spose.

Symmetry triangulation can also deal with objects that have unreliable surface pixel infor-mation, such as those with reflective and transparent surfaces. These objects are difficultto deal with using traditional stereo methods as these methods rely on the assumptionthat a surface appears similar across multiple views. Symmetry triangulation makes useof edge pixel information only, so this assumption is not necessary. As such, the approachelegantly generalizes across symmetric objects of different visual appearances without re-quiring any prior knowledge in the form of object models.

In the context of robotics, the method is especially useful when dealing with surfaces ofrevolution. As discussed at the beginning of Chapter 3, the triangulated symmetry axisof a surface of revolution is the same as its axis of revolution. Assuming uniform massdistribution, the symmetry axis of a surface of revolution object will pass through its centerof mass. As such, robotic grasping force should be applied perpendicular to the symmetryaxis to ensure stability. The structural information implied by bilateral symmetry mayalso be useful for determining object pose before and during robotic manipulation.

4.3.2 Camera Calibration

The stereo camera pair is calibrated using the MATLAB calibration toolbox [Bouguet,2006]. Both intrinsic and extrinsic parameters are estimated during calibration. Theintrinsic parameters model camera-specific properties, such as focal length, pixel offset ofthe optical center and radial lens distortion. The extrinsic parameters model the geometricpose of the camera pair, providing the necessary translation and rotation matrices tomap one camera’s coordinate frame into the other. Note that there are some stereomethods that do not require any prior camera calibration. Instead, these stereo methodsuse triangulated features to recover camera calibration parameters during operation.

68


Figure 4.5 shows the extrinsics of the stereo cameras and the physical setup of the ex-perimental rig. The stereo cameras on the robot’s head are positioned at roughly arm’slength from the checkerboard pattern. At the top of the figure, the red triangles in theplot each represent a camera’s field of view. The plot’s origin is located at the focal pointof the left camera. The cameras are verged towards each other to provide a larger overlapbetween image pairs when viewing objects at arm’s reach. The vergence angle is roughly15 degrees and the right camera is rotated slightly about its z axis due to mechanicalerrors introduced by the mounting bracket.

4.3.3 Triangulating Pairs of Symmetry Lines

Due to the reduction from three dimensions down to a pair of 2D images, stereo corre-spondence is not a straightforward problem. Apart from the obvious issue of occlusions,where only one camera can see the point being triangulated, other problems can arise.For example, specular reflections and non-Lambertian surfaces can cause the same physi-cal location to appear differently in each stereo image, causing incorrect correspondences.The proposed method attempts to provide a robust solution by using symmetry lines asthe primary feature for stereo matching. By using symmetry, which does not rely onobject surface information, reflective and transparent objects are able to be triangulatedsuccessfully.

The 3D location of an object’s symmetry axis is triangulated using the following method.Firstly, the symmetry line is projected out from the camera’s focal point. The projectionforms a semi-infinite triangular plane in 3D space. This projection is done for both cameraimages using their respective detected symmetry lines. After this, the line of intersectionbetween the triangular planes emanating from each camera is calculated. Assuming theresult is not undefined, the triangulation result is simply this line of intersection.

The projection of a symmetry line into a triangular plane in 3D space is performed asfollows. The first step is to locate the end points of the symmetry line in the camera image.This begins by finding the pivot point on the symmetry line. A line joining the imagecenter with the pivot point has length equal to the symmetry line’s radius parameter. Theadjoining line is perpendicular to the symmetry line. Note that the following mathematicsassume that pixel indices begin at zero, not one.

69


(a) Extrinsics of verged stereo camera pair

(b) Experimental rig

Figure 4.5: Stereo vision hardware setup.

70


xr = r cos θ +w − 1

2(4.1a)

yr = r sin θ +h− 1

2(4.1b)

xr and yr are the horizontal and vertical pixel coordinates of the symmetry line’s pivotpoint. r and θ are the polar parameters of the symmetry line. w and h are the widthand height of the camera image in pixels. The nearest image borders intersecting with thesymmetry line are found using a vector between the pivot point and the image center.

dA = MIN{

xr

sin θ,xr − w + 1

sin θ,− yr

cos θ,h− 1− yr

cos θ

}(4.2a)

dB = MIN{− xr

sin θ,w − xr − 1

sin θ,yr

cos θ,1 + yr − yr

cos θ

}(4.2b)

dA and dB are distances to the nearest image borders from the end points of the symmetryline. The special cases where division by zero occurs due to the sin and cos denominatorterms are dealt with programmatically by removing them from the MIN function.

pA = {−dA sin θ + xr, dA cos θ + yr} (4.3a)

pB = {dB sin θ + xr,−dB cos θ + yr} (4.3b)

The end points of the symmetry line are found by calculating pA and pB. Note that thisprocess is repeated twice for each symmetry line pair, to obtained two pairs of end points,one for the left camera and one for the right camera. Note that the end point calculationsdescribed here are also used in the visualization code for drawing detected symmetry linesonto input images.

Next, the symmetry line end points are normalized according to the camera’s intrinsics be-fore projecting them to form a triangular plane. This process removes image centering off-sets and normalizes the focal length of both cameras to unity. Radial lens distortion is alsotaken into account in the normalization. The normalization code, implemented in C++, isbased on the MATLAB Calibration Toolbox routine found in comp_distortion_oulu.m.

The normalized end points of the symmetry lines are then projected out into 3D space toform a plane. The projection of a single point is performed as follows.

pp = zmax

[pnorm

1

](4.4)

71


The constant zmax governs the depth of the triangular plane projected from the camera.In the experiments presented in this section, zmax is set to 2.0m so that symmetry axesof objects well out of arm’s reach of the robot do not appear in the triangulation results.A ‘1’ is appended to the normalized symmetry end point location, pnorm, to produce thehomogenous coordinate vector necessary for projection. The result of the projection, pp,is a point in 3D space.

The end points of the left and right camera symmetry lines are projected into 3D space.Note that the projected end points of the left symmetry lines have a different frame ofreference to those projected from the right symmetry line. The projected points of theleft symmetry line are bought into the right camera coordinate frame using the followingmatrix equation.

ppright = Rcppleft + Tc (4.5)

The 3× 3 matrices Rc and Tc are the extrinsic rotation and translation matrices betweenthe left and right camera coordinate frames obtained from stereo camera calibration. Nowthat all projected points are in the right camera coordinate frame, symmetry triangulationcan be performed. Two triangular planes are formed by linking each camera’s focal pointswith the projected end point pairs of their respective symmetry lines.

Tomas Moller’s triangle intersection method [Moller, 1997] is used to find the intersectionbetween the two triangular planes. The entire symmetry triangulation method is imple-mented using C++. The compiled binary run at 5 frame-pairs-per-second on a Pentium M1.73GHz laptop PC when operating on pairs of 640×480 images. This frame rate includesCanny edge detection and fast bilateral symmetry detection for both stereo images. Notethat the Intel C Compiler was not available during the development of symmetry triangu-lation software, so higher frame rates will be achieved with more aggressive optimizationduring compilation.

4.3.4 Experiments and Results

Triangulation experiments were carried out on six symmetric objects with different visualappearances. All six test objects can be seen in Figure 4.6. The test objects includelow texture, reflective and transparent objects, all of which are challenging for traditionalstereo methods.

For each test object, 4 image pairs are used as test data, resulting in a total of 24 imagepairs. The data set of the multi-colour mug is shown in Figure 4.7. The test object’ssymmetry line is centered on each of the 4 outer corners of the checkerboard pattern. Testimages are taken with the stereo cameras looking down at the checkerboard, with theobject roughly at arm’s length. This is to simulate the kind of object views a humanoidrobot would encounter when interacting with objects on a table using its manipulator.

72


(a) White cup (b) Multi-colour mug

(c) Textured bottle (d) White bottle

(e) Reflective can (f) Transparent bottle

Figure 4.6: Test objects used in the triangulation experiments.

73


(a) Location 1

(b) Location 2

(c) Location 3

(d) Location 4

Figure 4.7: Example stereo data set – Multi-colour mug. The left camera image is shown on theleft for each location’s stereo pair.

74


Figure 4.8 shows the stereo triangulated symmetry axes of the reflective metal can. Thered lines are the triangulated symmetry axes of the can when it is placed at the four outercorners of the checkerboard pattern as demonstrated in Figure 4.7. The blue dots are thecorners of the checkerboard. The stereo camera pair can be seen in the upper left of thefigure.

Figure 4.8: Triangulation results for the reflective metal can in Figure 4.6(e).

4.3.5 Accuracy of Symmetry Triangulation

To examine the accuracy of symmetry triangulation, each triangulated symmetry axis iscompared against a known ground truth location. To obtain ground truth, an additionalimage pair is taken with no objects on the checkerboard. This is done for each data set,to ensure that small movements of the checkerboard when changing test objects does notadversely affect the accuracy of ground truth data. The corner locations of the checker-board in 3D space are found by standard stereo triangulation using the camera calibrationdata. The four outer corner locations are used as ground truth data in the triangulationaccuracy measurements.

The table geometry is approximate by fitting a Hessian plane model to all triangulatedcorner locations. Standard least mean squares fitting is used. The plane Hessian providesthe 3D location of the table on which the test objects are placed. Using the plane model,an intersection between the object’s triangulated symmetry line and the table plane is

75


found for each test image pair. The Euclidean distance between this point of intersectionand the ground truth corner location is used as the error metric.

The following steps are used to measure the triangulation accuracy using the error metric.Firstly, the top five symmetry lines are found for each image in a stereo pair. Next,all possible pairings between symmetry lines from the left and right camera images aregenerated. Stereo triangulation is performed on all possible symmetry line pairs. Atriangulated symmetry axis is ignored if it is too far from the camera pair, as it cannotbe reached by the robot. Triangulated symmetry axes that are not orientated within ±15degrees of the checkerboard’s surface normal are also ignored. If no valid symmetry axesare found, the triangulation is considered to have failed. Only the intersection pointsbetween the remaining valid symmetry lines and the table plane are recorded. Afterobtaining a list of intersection points for all test image pairs, the triangulation error ismeasured using the aforementioned error metric. In the case where multiple symmetryaxes are found for a single ground truth datum, the symmetry axis closest to ground truthis used to calculate the triangulation error.

Table 4.2 shows the mean triangulation error for the test images. The mean error iscalculated across four triangulation attempts, one at each outer corner of the checkerboardpattern. There is only a single triangulation failure among the 24 test image pairs. Thefailed triangulation occurred for Location 3 in Figure 4.7. The failed triangulation is dueto self occlusion caused by the mug’s handle, which severely disrupted the object contourin the left camera image. This disruption resulted in failed symmetry detection for theleft camera image.

Table 4.2: Triangulation error at checkerboard cornersObject Mean error (mm)

White cup 13.5Multi-colour mug 6.8 †

White bottle 10.7Textured bottle 12.4Reflective can 4.5

Transparent bottle 14.9†Triangulation failed for location 4

The mean error of the successful triangulations is 10.62mm, with a standard deviation of7.38mm. Only a mean of 1.5 symmetry axes are generated for each test image pair. Asfive symmetry lines are detected for each test image, 25 pairing permutations are given tothe triangulation method for each image pair. The small number of valid symmetry axesreturned by symmetry triangulation suggests that it is possible to reject most non-objectsymmetry axes by using the aforementioned distance and orientation constraints.

76


4.3.6 Qualitative Comparison with Dense Stereo

Dense stereo approaches provide 3D information about the surface of an object. This isdifferent from the symmetry axes returned by symmetry triangulation, which is a structuralfeature of an object. A symmetry axis is always inside an object, and usually passes nearits center of volume. In this section, a qualitative comparison is performed between densestereo disparity and symmetry axes returned by symmetry triangulation. Sparse stereoapproaches are excluded from this comparison as they generally do not rely on cameracalibration, which unfairly biases the comparison in favour of symmetry triangulation.

Disparity maps are generated using the dense stereo C++ code from the MiddleburyStereo Research Lab [Scharstein and Szeliski, 2001]. The input images are rectified usingthe MATLAB calibration toolbox prior to disparity calculations. After testing multiplecost functions and optimization approaches, the sum of squared differences (SSD) using15×15 windows is found to produce the best disparity results for the test image set. Globaloptimization methods, such as dynamic programming, did not provide any significantimprovements. Figures 4.9 to 4.11 contain the disparity results for several test objects.Darker pixels have lower disparity, meaning that the locations they represent are furtherfrom the camera. The object’s location in the disparity map is marked with a red rectangle.Note that grayscale images are used to generate the disparity maps.

In Figure 4.9, the textured bottle’s curvature and glossy plastic surface, combined withnon-uniform lighting, can cause the same surface to appear differently across camera im-ages due to specular reflection. The transparent bottle appears as a distorted version ofits background in Figure 4.10, with its appearance changing between viewpoints. Thereare also many specular reflections on the surface of the bottle. These visual issues violatethe similarity assumption employed in dense stereo. Therefore, the disparity results ofthe object’s surface is very inconsistent. Similarly, Figure 4.11 shows that the reflectivecan acts as a curved mirror that reflects its surroundings. This again results in differentsurface appearances in the left and right camera images, which leads to poor disparityresults. All three objects can be triangulated using bilateral symmetry. Looking at thedense stereo disparity maps, it is difficult to imagine a method that can recover the objectlocation or structure.

77


(a) Left image

(b) Right image

(c) Dense Stereo Disparity

Figure 4.9: Dense stereo disparity result – Textured bottle. The bottle’s location is enclosed by ared rectangle in the disparity map.

78


(a) Left image

(b) Right image


Figure 4.10: Dense stereo disparity result – Transparent bottle. The bottle’s location is enclosedby a red rectangle in the disparity map.

79


(a) Left image

(b) Right image


Figure 4.11: Dense stereo disparity result – Reflective can. The can’s location is enclosed by ared rectangle in the disparity map.

80


4.3.7 Discussion

The proposed stereo triangulation approach is conceptually different from dense and sparsestereo methods. Firstly, the approach does not rely on an object’s surface visual appear-ance. Instead, a structural feature, bilateral symmetry, is used to perform triangulation.Recall that the fast symmetry detector only relies on edge pixels as input. As a result, sym-metry triangulation is able to operate on objects with unreliable surface pixel information,such as those that are transparent and reflective.

Secondly, in the context of detecting and locating objects on a table, dense and sparsestereo approaches only provide 3D location estimates of surfaces. Further model fitting isneeded to obtain structural information of objects. By only targeting bilaterally symmetricobjects, symmetry triangulation is able to obtain some structural information withoutrelying on object models. While geometric primitives can be used to represent objectsfor the purpose of robotic manipulation [Taylor, 2004], objects such as the tall whitecup in Figure 4.6(a) are difficult to model as a single primitive. In contrast, symmetrytriangulation is able to localized this object accurately assuming geometry of the tableplane is known. The table plane geometry can be recovered dynamically by performingrobust plane fitting to stereo disparity information.

Triangulated symmetry axes also provide useful information with regards to the planningand execution of robotic manipulations. Recall that the triangulated symmetry axis ofan object is always within its surface. For solid objects, a symmetry axis provides usefulhints for robotic manipulation without resorting to higher level knowledge. For example,grasping force should generally be applied radially inward, in a manner perpendicular toan object’s symmetry axis. This grasping strategy is especially applicable for surface ofrevolution objects such as cups and bottles.

Symmetry triangulation using stereo has two main limitations. Firstly, triangulation canonly be performed on objects exhibiting bilateral symmetry in two camera views. This re-striction suggests that the method will be most reliable when operating on objects that aresurfaces of revolution. Even with this limitation, a large variety of household objects suchas cups, bottles and cans, can be triangulated by their symmetry. Also, the edge featuresused by the fast symmetry detector are visually orthogonal to the pixel information usedin dense stereo and most sparse stereo approaches. As such, symmetry triangulation canbe applied synergetically with other stereo methods to improve a vision system’s abilityto deal with symmetric objects, especially those with transparent or reflective surfaces.

Secondly, symmetry triangulation does not explicitly address the problem of correspon-dence between symmetry lines in the left and right camera. As the experiments only dealtwith scenes containing a single symmetric object, this was a non-issue. However, in a scenewith multiple symmetric objects, pairings between symmetry lines belonging to differentobjects can occur. Triangulation of these pairs may produce phantom symmetry axes thatdo not correspond to any physical object.

81


The majority of phantom symmetries can be rejected by examining their location and ori-entation. For a robot dealing with objects on a table, symmetry axes outside its reachableworkspace can be safely ignored. Symmetry axes that do not intersect the table plane in aroughly perpendicular manner can also be rejected as they are unlikely to originate froman upright bilaterally symmetric object. The remaining phantom symmetries are difficultto reject without using additional high level information such as object models or expectedobject locations. However, robotic action can be used to reject phantom symmetry axeswithout resorting to the use of object models. Chapter 6 shows that a robotic nudge ap-plied to a symmetry axis can detect the presence of an object and simultaneously obtaina segmentation of the detected object.

Since the triangulation experiments documented in this section, the accuracy of symmetrytriangulation has been improved. The current implementation operates with an error ofaround 5mm, half of the previous average. This reduction in error is due primarily tothe use of sub-pixel refinement in the fast symmetry detection process. Also, takingthe average result of multiple triangulations will produce a centroid with better errorcharacteristics.

4.4 Chapter Summary

This chapter detailed two methods that make use of detected bilateral symmetry to senseobjects in static scenes. The first method uses dynamic programming to perform segmen-tation on a single image, extracting contours of symmetric objects from the image’s edgepixels. Timing trials of the segmentation method suggest comfortable operation at 30frames per second. The second method uses two images obtained from a pair of calibratedstereo cameras as input data. Symmetry lines detected in the left and right camera im-ages are paired and triangulated to form symmetry axes in three dimensional space. Bothmethods do not require any prior object models before operation. Also, both methods areable to robustly deal with objects that have reflective and transparent surfaces.

This chapter also highlighted several ways to disambiguate between object and backgroundsymmetries. For example, the angle between a symmetry axis and the table plane can beused to reject unwanted symmetries. The next chapter details a real time object tracker.The tracker is able to estimate the symmetry line of a moving object while rejecting staticbackground symmetries by using motion and symmetry synergetically.

82

Time is an illusion, lunchtime doubly so.

Douglas Adams 5Real Time Object Tracking

5.1 Introduction

The previous chapter introduced a couple of computer vision methods that exploit detectedsymmetry to segment and triangulate static objects. Many robotic tasks, such as objectmanipulation, demand the sensing of moving objects. These tasks require real time sensingin order to keep track of the target object while providing time-critical feedback to therobot’s control algorithms. This chapter details research on real time object tracking. Theproposed tracking method makes use of motion and detected symmetry to estimate anobject’s location rapidly and robustly. The symmetry tracker also returns a segmentationof the tracked object in real time. Additionally, the C++ implementation of the proposedmethod is able to operate at over 40 frames per second on 640×480 images. The symmetrytracker was first published in [Li and Kleeman, 2006b], with additional experiments andanalysis later published in [Li et al., 2008].

As with the segmentation and symmetry triangulation methods, the proposed trackingmethod is model-free. This allows the tracking of new objects, as long as the target hasa stable symmetry line. In order to perform tracking in real time, model-free approachesuse features that are computationally inexpensive to extract and match. For example,[Huang et al., 2002] uses region templates of similar intensity. [Satoh et al., 2004] utilizescolour histograms. Both approaches can tolerate occlusions, but are unable to handleshadows and colour changes caused by variations in lighting. As symmetry measurementsare detected from edge pixels, the proposed tracker does not suffer from this limitation.

The main contribution of the research presented in this chapter is the use of bilateralsymmetry as an object tracking feature. This was previously impossible due to the pro-hibitive execution times of symmetry detection methods, which has been overcome byusing the fast symmetry detector. Additionally, Kalman filter predictions are used tolimit the detection orientation range in order to further reduce execution time. On thesurface, using bilateral symmetry for object tracking seems very restrictive. However,there are many objects with bilateral symmetry. Many man-made objects are symmetricby design. This is especially true for container objects such as cups and bottles, which

83

CHAPTER 5. REAL TIME OBJECT TRACKING

are generally surfaces of revolution. Being visually orthogonal to tracking features such ascolour, the use of symmetry allows the tracker to operate on objects that are difficult fortraditional tracking methods. For example, the proposed method can track transparentobjects, which are difficult to deal with using other tracking methods.

5.2 Real Time Object Tracking

The system diagram in Figure 5.1 provides an overview of the proposed object tracker.The tracker uses two time-sequential video images as input data. These images are la-belled as ImageT−1 and ImageT in the system diagram. The Block Motion Detectormodule creates a Motion Mask representing the moving portions of the scene. The mo-tion mask is used to reject edge pixels from static parts of the video, which helps preventthe detection of background symmetries. The remaining edge pixels are given to the FastSymmetry Detector as input. Detected symmetry lines are provided to the Kalman Filteras Measurements. The Posterior Estimate of the Kalman filter is the tracking result.

T-1

T

lower upper

KF, KF

1, 1 2, 2

Figure 5.1: System diagram of real time object tracker.

Once tracking begins, the Kalman filter provides angle limits to the symmetry detectionmodule. These angle limits represent the orientations of symmetry lines the Kalmanfilter considers to be valid measurements. These angle limits are generated based on thefilter’s prediction and the prediction covariance. The speed and robustness of symmetrydetection are greatly improved by using these prediction-based angle limits to restrictdetection orientation.

The symmetry tracker also returns a motion-based segmentation of the tracking target inreal time. The segmentation is performed by using the Kalman filter’s posterior estimate

84

Section 5.2. Real Time Object Tracking

to refine the motion mask produced by the block motion detector. The refinement processresults in a near-symmetric segmentation of the tracked object. The object segmentationis returned as a Refined Motion Mask and a Bounding Box that encapsulates the mask.

5.2.1 Improving the Quality of Detected Symmetry

Recall from previous discussions that the fast symmetry detector can return non-objectsymmetry lines due to inter-object symmetry or background features. Non-object sym-metries can overshadow object symmetries, especially when the target object has a weakedge contours. An example of this is provided in Figure 5.2(a). The top three symmetrylines are labelled according to the number of Hough votes they received, with line 1 havingreceived the most votes. The weak edge contour of the transparent bottle results in itssymmetry line being ranked lower than a background symmetry line.

During tracking, the Kalman filter uses a validation gate to help reject unwanted sym-metries, labelled as lines 1 and 3 in Figure 5.2(a). However, a more cluttered scene mayintroduce so many additional symmetry lines that the target object’s symmetry line is notdetected at all. The repeated use of these noisy symmetry lines as measurements maylead to Kalman filter divergence. To overcome this, the tracker employs several methodsto prevent the detection of noisy symmetry lines.

Firstly, the state prediction of the Kalman filter is used to restrict the orientation rangeof detection. The Kalman filter’s θ prediction and the prediction covariance are used torestrict the detection orientation range. This prevents the detection of symmetry lineswith orientations that are vastly different from the filter’s prediction. In the experimentspresented here, the angle limits are set at three standard deviations from the filter predic-tion.

Figure 5.2(b) shows a visualization of the detection angle limits during typical trackeroperation. Using these angle limits to restrict detection orientation will remove symmetrylines 1 and 3 in Figure 5.2(a), pushing the transparent bottle’s symmetry line to numberone in terms of Hough votes. Additionally, reducing the angular range of symmetrydetection also lowers the execution time linearly as previously shown in Figure 3.11.

85


(a) Detection of non-object symmetry

(b) Tracking angle limits

Figure 5.2: Using angle limits to reject non-object symmetry. The top figure shows a case wherenon-object symmetries are detected ahead of object symmetries. The bottom figure visualizes theangle limits generated from the Kalman filter prediction covariance.

86


However, restricting detection orientations using angle limits based on the Kalman filter’sprediction will not reject noisy symmetries with similar orientations to the symmetry lineof the target object. These unwanted symmetries can arise from background features,inter-object symmetries or from other symmetric objects. It is possible for the Kalmanfilter to latch onto these noisy symmetry lines if the tracked object moves over them ata low speed. This problem is difficult to overcome without relying on an object model ofthe tracking target. To maintain a model-free approach, the tracker takes advantage ofobject motion to remove edge pixels from static portions of an image. By doing this, themajority of edge pixels contributing to non-target symmetries are rejected. Details of themotion masking method are provided in the next section.

5.2.2 Block Motion Detection

Many methods of motion detection are available from computer vision literature. Opticalflow approaches, such as the Lucas-Kanade method [Lucas and Kanade, 1981], are ableto find spatial and orientation information of motion. Generally, they are employed insituations where there is camera movement or large areas of background motion. However,optical flow is computationally expensive to calculate for large images. Also, as the trackeronly needs to reject edge pixels from static portions of a video, the orientation informationprovided by optical flow is superfluous. Given that the camera system is stationary duringoperation and the desire for real time object tracking, optical flow techniques are not usedto perform motion detection.

Background subtraction approaches monitor image pixels over time to construct a modelof the scene. Pixels that significantly alter their values with respect to the model arelabelled as moving. Mixture of Gaussian models [Wang and Suter, 2005] are fairly robustto small changes in background statistics such as the repetitive movement of tree branchesin the wind. However, background subtraction methods require a training period withno object motion to build a background model. This will require idle training periodsbetween tracking sessions, which is problematic for situations where object motion cannotbe prevented or controlled. Another point of note is that background modelling approachesare not suitable for dealing with transparent and reflective objects as they taken on thevisual characteristics of their surroundings.

The block motion detector uses a frame difference approach to obtain motion information.The classic two-frame absolute difference [Nagel, 1978] is used as it has a very low compu-tational cost. Colour video images are converted to grayscale before performing the framedifference. The difference image is calculated by taking the absolute difference of pixelvalues between time-adjacent frames. The block motion detection method is described inAlgorithm 5.

The block motion detection algorithm produces a blocky motion mask that segments objectmotion. The algorithm processes the difference image IDIFF using SB×SB square blocks.The choice of block size is based on the smallest scale of motion that needs to be detected.

87


Algorithm 5: Block motion detectionInput:I0, I1 – Input images from time t− 1, tOutput:IMASK – Motion mask, same size as input imagesParameters:TM – Motion thresholdSB – Block sizeIDIFF – Difference image, same size as input imagesIRES , ISUM – Buffer images with sides 1

SBthe length of the input images

IDIFF ← |I1− I0|1

ISUM [ ][ ] ← 02

i← 03

for ii← 0 to height of ISUM do4

m← i5

i← i+ SB6

for increment m until m == i do7

j ← 08

for jj ← 0 to width of ISUM do9

n← j10

j ← j + SB11

for increment n until n == j do12

ISUM [ii][jj]← ISUM [ii][jj] + IDIFF [m][n]13

IRES ← THRESHOLD(ISUM , MEAN(ISUM ) × TM )14

Median filter IRES15

Dilate IRES16

IMASK ← IRES resized by a factor of SB17

In the tracker implementation, the block motion detector uses 8 × 8 pixel blocks. Thealgorithm proceeds by summing the pixel values of the difference image in each 8 × 8block. This is carried out on lines 3 to 13 of Algorithm 5.

Each block’s difference sum is compared against the global mean sum of all blocks. Ablock whose sum is significantly larger than the global mean is classified as moving. Thisis performed on line 14 of Algorithm 5 by thresholding the ISUM image. The thresholdis a multiple of the global mean. The motion threshold multiplier TM is determinedempirically by increasing it from an initial value of 1 until camera noise and small objectmovements are excluded by the threshold. The motion threshold of 1.5 is used in alltracking experiments presented in this section.

The result of the thresholding is stored in IRES , which is essentially a low resolution binaryrepresentation of the motion found between the input video frames I0 and I1. In the C++implementation, a binary one in IRES is used to represent detected motion. Static blocksare represented by a binary zero. Median filtering is performed after thresholding toremove noisy motion blocks, which can arise from small movements in the background orslight changes in camera pose. The filtered result is dilated to ensure that all edge pixels

88


belonging to the contour of a moving object are included by the motion mask. Finally,the IRES image is resized by a factor of SB to produce the output motion mask IMASK .As the motion mask IMASK is the same size as the input images, a simple logical ANDoperation is used to reject edge pixels contributed by static parts of the scene.

5.2.3 Object Segmentation and Motion Mask Refinement

Apart from being used to reject unwanted edge pixels, the motion mask is also used toobtain a segmentation of the tracked object. Motion segmentation is performed in real timeby zeroing images pixels labelled as static in the motion mask. Motion mask segmentationresults are shown in the Figures 5.3(a) and 5.4(a). Motion masking is implemented usinga logical AND operation. This allows real time operation as the logical AND has a lowcomputational cost. However, the raw motion mask tends to produce segmentations withgaps, which can be seen in both segmentations. Also, pixels that do not belong to thetarget object, such as the arm actuating the object, can be included in the segmentationresult. To overcome this problem, a refinement step is applied to improve the segmentationresult.

Refinement is performed on the binary motion blocks image, named IRES in Algorithm 5.This dramatically reduces the computational cost of refinement as the number of pixelsin IRES is 1

SB2 of the motion mask (IMASK). The refinement is performed as follows.

Firstly, motion that do not belong to the target object is rejected by enforcing a symmetryconstraint. Each block with motion is reflected across the object’s symmetry line. If alocal window centered at the reflected location contains no motion, the motion block isrejected. This results in a segmentation that is roughly symmetric about the trackedobject’s symmetry line. Secondly, holes and gaps in the mask are removed using a localapproach. A small window is passed over the IRES image. If a block has many movingneighbours within the window, it is labelled as moving. This two-step refinement processis very computationally efficient as it does not require multiple iterations.

Figures 5.3(b) and 5.4(b) show the improved segmentations obtained by using symme-try to refine the motion mask. Again, a logical AND operation is used to perform thesegmentation. The posterior symmetry line estimate used by the refinement process isshown as a red line. Notice that the majority of the experimenter’s arm is removed inboth symmetry-refined segmentations.

89


(a) Motion mask segmentation

(b) Symmetry-refined motion segmentation

Figure 5.3: Motion mask object segmentation – White bottle. The tracker’s posterior estimate ofthe symmetry line is drawn in red. This symmetry line is used to produce the refined segmentation.

90


(a) Motion mask segmentation

(b) Symmetry-refined motion segmentation

Figure 5.4: Motion mask object segmentation – White cup. The tracker’s posterior estimate ofthe symmetry line is drawn in red. This symmetry line is used to produce the refined segmentation.

91


5.2.4 Kalman Filter

After removing edge pixels from static portions of the input image using the motion mask,fast symmetry detection is performed using the remaining edge pixels. The (R, θ) parame-ters of the detected symmetry lines are provided to a Kalman filter as measurements. TheKalman filter combines input measurements and a state prediction to maintain a posteriorestimate of the target object’s symmetry line. The filter state contains the position, veloc-ity and acceleration of the symmetry line’s parameters. The Kalman filter implementationis based on [Bar-Shalom et al., 2002]. The filter plant, measurement and state matricesare as follows.

A =

1 0 1 0 12 0

0 1 0 1 0 12

0 0 1 0 1 00 0 0 1 0 10 0 0 0 1 00 0 0 0 0 1

H =

[1 0 0 0 0 00 1 0 0 0 0

]x =

R

θ

R

θ

R

θ

(5.1)

The process and measurement covariances are determined empirically. The matrices as-sume that the noise variables are independent, with no cross correlation between R andθ values. The diagonal elements of the process covariance matrix are 1, 0, 1, 10, 1, 10, 1.The remaining elements in the process covariance matrix are zero. The measurementcovariance for R is 9 pixels2 and the θ covariance is 9 degrees2.

For each video image, multiple symmetry lines are given to the Kalman filter as measure-ments. As tracking is performed on a single target object, the Kalman filter must choosethe best measurement to update its tracking estimate. In other words, the problem of dataassociation must be addressed. Data association is performed using a validation gate. Thevalidation gate is based on the mathematics presented in [Kleeman, 1996].

Measurements are validated using the following procedure. Symmetry line parameters thatgenerate an error above 9.2, which equates to the 2-DOF Chi-square value at P = 0.01,are rejected by the validation gate. If no symmetry line passes through the gate, thenext state will be estimated using the state model alone. If multiple valid symmetry linespasses through the validation gate, the symmetry line with the lowest validation erroris used to update the Kalman filter estimate. Hence, the validation gate performs bothmeasurement validation and data association.

Traditionally, Kalman filter initialization is performed manually by specifying the startingstate of the tracking target at a particular video image. However, as the eventual goal is touse the symmetry tracker to estimate the location of new objects, automatic initializationis needed. The initialization method begins with a crude specification of the object’s initialstate to ensure filter convergence. The video image for which tracking begins does not

92

Section 5.3. Object Tracking Results

have to be specified manually. Instead, the tracker monitors the number of moving blocksreturned by the block motion detector. By looking for a sharp jump in the quantity ofdetected motion, the video frame where the target object begins to move is determined.

Symmetry lines detected from the three video images after the time of first object move-ment are used to automatically initialize the Kalman filter. All possible data associationsare generated across these three time-consecutive video images. In the tracking exper-iments, the Nlines parameter in Algorithm 1 is set to three. This limits the maximumnumber of detected symmetry lines to three, which constrains the number of possiblesymmetry line associations to 27 permutations.

The tracking error of each permutation is examined using the Kalman filter. For each per-mutation, the first measurement is used to set the filter’s initial state. The filter is updatedin the usual manner using the second and third measurement of the permutation. Thevalidation gate error is accumulated across all three measurements for each permutation.The permutation resulting in the smallest accumulated validation error is used to initializethe Kalman filter for actual object tracking. This automatic initialization method is usedto bootstrap tracking in all experiments presented in this section.

5.3 Object Tracking Results

The proposed symmetry tracker is tested against ten videos each containing at least one bi-laterally symmetric object under motion. The test videos were captured with a IEEE1394CMOS camera in an indoor environment without controlled lighting. The videos werecaptured at 25 frames per second with a resolution of 640 × 480 pixels. Image sequencesfrom several tracking videos can be found at the end of the author’s IROS paper [Li andKleeman, 2006b].

Videos of the tracking results can be found in the tracking folder of the multimediaDVD included with this thesis. The videos are available as WMV and H264 files in theirrespective folders. The H264 videos have better image quality than the WMV videos butrequire more processing power for playback decoding. If video playback problems occur,the author recommends the cross platform and open source video player VLC. A WindowsXP installer for VLC is available in the root folder of the multimedia DVD. Note that thetracking videos are also available online:• www.ecse.monash.edu.au/centres/irrc/li_iro2006.php

In the video results, the posterior estimate of the tracked object’s symmetry line is drawnas a thick red line. The green bounding box encloses the refined motion mask of thetracked object. Note that the rectangular bounding box is automatically rotated so thattwo of its sides are parallel to the symmetry line. Figures 5.5 and 5.6 contain two examplesof how rotated bounding boxes are generated from refined motion masks.

93

www.ecse.monash.edu.au/centres/irrc/li_iro2006.php


(a) Refined motion mask

(b) Rotated bounding box

Figure 5.5: Generating a rotated bounding box from the refined motion mask of a transparentbottle. The bounding box is drawn in green and the tracker’s posterior symmetry line estimate isdrawn in red.

94


(a) Refined motion mask

(b) Rotated bounding box

Figure 5.6: Generating a rotated bounding box from the refined motion mask of a multi-colourmug. The bounding box is drawn in green and the tracker’s posterior symmetry line estimate isdrawn in red.

95


The bounding box may appear unstable or overly large in some videos. This does notimply tracker divergence, as the bounding box is not being estimated by the tracker. Thesejitters in the size and location of the bounding box are due to temporal changes in therefined motion mask. Temporal filtering can be applied to the bounding box parameters toremove these jitters. For the sake of visual clarity, separate videos without the boundingbox are available for test videos 1, 2 and 9. These video files are labelled with the suffixsymline only.

5.3.1 Discussion of Tracking Results

For all ten test videos, the proposed tracker remained convergent during operation, neverdiverging from the target object’s symmetry line. The test videos include a variety ofvisually challenging situations that may be encountered by a tracking system employed inthe domain of robotics. This section discusses these challenges and how they are met bythe proposed tracking method. Note that the test videos will be referred to by the numberat the end of their filenames.

Large Changes in Object Pose

An object’s pose refers to its location and orientation in three-dimensional space. Due tothe projective nature of visual sensing, scale changes can results from object pose changesif the object moves towards or away from the camera. Similarly, a change in objectorientation can alter the visual appearance of an object dramatically. For a model-freetracking approach such as the proposed method, the lack of a priori object knowledgefurther increases the difficulty of pose changes as no predictions can be made concerningfuture object appearances.

Test videos 3, 5, 6 and 7 all contain large changes of the object pose. Video 3 showsthe successful tracking of a symmetric cup over a range of orientations, well over 120degrees. Also, the cup is titled towards the camera during the video, which alters its visualappearance. Video 5 shows a textured bottle being tracked across different orientationsand locations. The tracker is able to successfully follow the bottle through various posechanges, including a large tilt away from the camera that occurs at the end of the video.

Video 6 increases the difficulty by including large changes in object scale. The video isalso much longer and includes a larger variety of object poses. The tracker is able tomaintain track of the cup during the entire video without any jitters in the posteriorestimate. Video 7 is similar to video 6, but the difficulty is greater as a multi-colourmug is now the tracking target. The mug has more noisy edge pixels and produces aweaker symmetry line due to its relatively short edge contour. Also, near the middleof the video, a large horizontal jump in the object’s location can be observed. This iscaused by an operating system lag during video capture. The tracker is able to cope withthese additional challenges, maintaining a convergent posterior estimate of the object’s

96


symmetry line during the entire video. Overall, the results suggest that the proposedsymmetry tracker is capable of tracking objects across large pose changes.

Object Transparency

Transparent objects, as shown in previous chapters, are difficult to deal with using tra-ditional computer vision approaches due to their unreliable visual appearance. Video 1shows the convergent tracking of a transparent bottle across different orientations. Thebottle produces very few edge pixels, resulting in several jitters of the posterior estimateduring tracking. However, the tracker did not diverge and the object’s symmetry line istracked successfully. Tracking methods that make use of an object’s internal pixel infor-mation will most likely diverge during this tracking video. For example, a colour-basedapproach will encounter vastly different hue histograms during tracking as the observedcolour of the bottle changes from a combination of brown and green to nearly entirelygreen when it arrives at a horizontal orientation.

Video 10 also shows a successful tracking result for a transparent bottle. In this testvideo, the bottle is moved in a L-shaped manner. This motion trajectory was chosen inorder to cause tracker divergence by constantly introducing sharp changes in the velocityand acceleration of the symmetry line parameters. Also, small changes in the object scalecan be observed during the portions of the video where the bottle is actuated vertically.Note that the appearance of the bottle changes drastically as it moves over backgroundsof different hues and intensities. Again, this will cause trackers that rely on an object’ssurface appearance to diverge as the object sharply changes colour and intensity duringthe horizontal portion of the L-shaped movement. The convergent tracking achieved onthis difficult video further confirms the proposed tracker’s robustness against changes ofobject appearance. The results also show that the tracking method is able to robustlytrack transparent objects.

Occlusion of Tracked Object

Similar to object transparency, occlusions of the tracking target changes its visual ap-pearance. However, occlusions also remove visual information that may provide featuressorely needed to maintain tracking convergence. In video 2, a green bottle is occluded byanother green object. The similar colours of the tracking target and the occluding objectis difficult for colour-based tracking methods to resolve without relying on object models.However, the proposed model-free symmetry-based method is able to keep track of thegreen bottle’s symmetry line despite the occlusion. The tracker’s success is due to theuse of edge pixel information by the fast symmetry detector, which are more reliable insituations where occlusions of similar colour to the tracked object occur.

In video 4, the occlusion problem is made more difficult by using a bilaterally symmetricobject to block the target. Note also that the mug reverses direction near the point of

97


occlusion, which may cause tracker divergence if the posterior estimate latches onto thesymmetry of the occluding cup. The convergent tracking result suggests that the blockmotion detector is able to generate a motion mask that rejects the static occlusion’s edgepixels while retaining enough edge pixels from the tracking target to allow for reliablesymmetry detection.

Two Objects with Bilateral Symmetry

In the remaining test videos, two moving objects with bilateral symmetry are present.The tracker is initiated on one object while the other object pollutes the Kalman filtermeasurements by contributing noisy symmetry lines. In video 8, the tracker follows abottle as it collides with a symmetric white cup, knocking the cup over in the process.The bottle comes to a sudden stop after the collision. Due to the inclusion of velocity andacceleration in the Kalman filter state model, the collision can cause tracking divergencedue to a large difference between the filter prediction and the bottle’s symmetry linemeasurement. Also, the cup’s symmetry line is near the filter’s post-collision prediction,which can cause the filter posterior estimate to latch onto the cup. The convergent trackingresult achieved for video 8 indicates that the combination of motion masking and Kalmanfiltering is robust and flexible enough to handle object collision scenarios.

In video 9, two symmetric objects move in opposite directions. Their symmetry lines passover each other near the middle of the video. As both objects are moving simultaneously,edge pixels from both objects are included by the motion mask. Due to the model-freenature of the proposed tracking approach, the symmetry lines of both objects are givento the Kalman filter as measurements. As such, this test video examines the Kalmanfilter’s ability to correctly associate symmetry line measurements with the tracked object.The convergent tracking result suggests that the Kalman filter’s linear acceleration model,coupled with the careful choice of covariance values, is able to disambiguate symmetrylines from different moving objects.

Note that the tracker’s motion-based segmentation generates an overly large boundingbox when the two objects pass over each other. This temporary enlargement of thebounding box is due to symmetry between the motion of the blue cup and the handactuating the white cup. As the motion mask does not contain orientation information,it is unable to distinguish between different sources of motion. As the primary goal is tomaintain a convergent estimate of a tracked object’s symmetry line, sporadic enlargementof the bounding box has no impact on tracking performance. The application of temporalfiltering to the bounding box parameters will improve the stability of the bounding box.The use of a motion detection method that provides orientation information, such asoptical flow, will further improve the quality and reliability of the motion segmentationresult.

98

Section 5.4. Bilateral Symmetry as a Tracking Feature

5.3.2 Real Time Performance

The entire tracking system is implemented using C++, with no platform specific optimiza-tions. A notebook PC with a 1.73GHz Pentium M processor is used as the test computerplatform. As previously stated, the test videos are all 640 × 480 pixels in size and cap-tured at 25 frames per second. Note that the timing results presented in this section arefrom trials carried out prior to the use of the Intel C Compiler. As such, the currentimplementation has even shorter execution times. The current tracker is able to operatecomfortably at 40 frames per second.

Table 5.1 contains the execution times of the tracker. Note that, the video number is thesame as the number in the video’s filenames. The longest video contains 400 frames. Thecode responsible for symmetry detection, block motion detection, motion mask refinementand Kalman filtering are timed independently. Only symmetry detection and Kalman fil-tering are absolutely necessary for object tracking, but block motion detection will greatlyimprove the tracker’s robustness to static symmetry lines. The column labelled Init con-tains the time taken to perform automatic initialization of the Kalman filter, as discussedat the end of Section 5.2.4. The mean frame rates during tracking are recorded in the FPScolumn.

Table 5.1: Object tracker execution times and frame rate

Video Mean execution time (ms) Init FPSnumber Sym detect Motion Refine Kalman (ms) (Hz)

1 37.87 4.84 0.86 0.09 10.41 22.912 16.76 4.76 0.75 0.06 9.74 44.773 17.95 4.85 0.85 0.04 10.69 42.224 18.31 4.74 0.75 0.04 11.90 41.965 33.69 4.87 0.87 0.05 11.38 25.336 20.84 4.94 0.85 0.04 13.18 37.507 35.29 5.01 0.87 0.13 11.32 24.228 34.48 4.94 0.79 0.14 11.14 24.799 18.19 4.91 0.79 0.06 11.83 41.7510 27.01 4.89 0.82 0.06 12.50 30.51

MEAN 26.04 4.88 0.82 0.07 11.41 33.60

5.4 Bilateral Symmetry as a Tracking Feature

In computer vision, the performance of a tracking method is usually gauged by its successrate at maintaining convergent estimates over a set of test videos. The last section ex-amined the proposed tracker’s performance on ten test videos. The experiments showedthat the tracker is fast and robust, achieving convergent real time tracking in all tenvideos. The results showed that the tracker is able to deal with partial occlusions, object

99


transparency, large pose changes of the tracking target and scenes with multiple symmet-ric objects. However, robotic tasks employing the proposed tracker may have accuracyrequirements, which are not addressed by the experiments thus far.

Firstly, the accuracy of tracking that can be achieved using detected symmetry should beevaluated quantitatively. The evaluation should be performed using different backgroundsto effectively gauge the flexibility and robustness of symmetry as a tracking feature. Sec-ondly, symmetry and colour should be compared qualitatively as object tracking features.Colour is chosen as the feature to compare against as it uses the pixel information withinan object’s contour, which is visually orthogonal to the edge pixels the fast symmetrydetector relies upon.

In order to meaningfully perform the evaluation and the comparison, tracking results mustbe compared against a reliable and accurate measure of the tracked object’s symmetryline. This is difficult as ground truth symmetry lines must be found for every video frame.Obtaining ground truth data for every video frame is also time consuming, especiallyconsidering that manual processing is usually required for real world video sequences.However, by exploiting the predictable motion of a pendulum, ground truth data can beobtained automatically.

5.4.1 Obtaining Ground Truth

To evaluate the accuracy of symmetry tracking under various background conditions, thetracker’s symmetry line estimate must be compared against the ground truth symmetryline of the tracked object. Ground truth data is generally troublesome to obtain due tothe lack of constraints on the target object’s trajectory. Also, manually extracting theobject’s symmetry line in long videos can introduce human errors into the ground truthdata. As such, a combined hardware-software solution is used to extract ground truthdata automatically.

On the hardware side, a pendulum rig provides predictable oscillatory object motion.This custom-built pendulum is shown in Figure 5.7. The pendulum pivot has 1 degree offreedom, which constrains the object’s motion to a plane. A hollow carbon fiber tube isused as the pendulum arm to limit flex during motion. The test object is a red plasticsqueeze bottle. The bottle is mounted at the end of the pendulum by passing the pendulumarm through its axis of symmetry, which is also its axis of revolution.

Colour markers are placed above and below the object on the pendulum arm. Thesemarkers provide stable colour features for the ground truth extraction software. A simplecolour filter is used to extract pixels belonging to the colour markers. The centroid ofeach marker is found by calculating the centers of mass of its extracted pixels. The polarparameters of the line passing through both markers centroids are recorded as the groundtruth symmetry line. This symmetry line extraction process is done without any human

100


1 Degree-of-

Freedom Pivot

Carbon Fiber Tube

Ground Truth

Markers

Figure 5.7: Pendulum hardware used to produce predictable object motion.

assistance. An example of an automatically extracted ground truth symmetry line isshown in Figure 5.8.

Before proceeding to use the automatically extracted symmetry lines as ground truthdata, their accuracy must be verified. As a pendulum moves in a predictable manner, Theautomatically extracted symmetry lines are compared against theoretical expectations.Equations 5.2 and 5.3 describe the R and θ parameters of the pendulum.

θ(t) = Ae−αtcos(ω(t− t0)) +B (5.2)

R(t) = Lθ(t) + L0 (5.3)

101


Figure 5.8: The automatically extracted ground truth symmetry line is shown in black. Thecentroids of the colour markers are shown as red and green dots.

As the angular magnitude of oscillations during the experiments are small, the pendulumequations uses the small-angle approximation of θ ≈ sin θ. Note that R(t) is a function ofθ(t). The damping is modelled as an exponential. The parameter α governs the rate ofdecay.

To evaluate the accuracy of the extracted symmetry lines, non-linear regression is per-formed using Equations 5.2 and 5.3. MATLAB’s nlinfit function is used to perform thenon-linear regression, simultaneously estimating A, α, t0, B, L and L0. The mean of ab-solute regression residuals of the extracted symmetry lines are examined across four testvideos, each containing 1000 images. Example images from the test videos are availablefrom Figures 5.9 to 5.12. All four sets of 1000 test video images are available from thependulum folder of the multimedia DVD. The readme.txt text file can be consulted foradditional information. Note that the same test videos are used in Sections 5.4.2 and 5.4.3.Table 5.2 contains the mean regression errors for the automatically extracted ground truthsymmetry lines.

Table 5.2: Pendulum ground truth data – Mean of absolute regression residualsVideo background R (pixels) θ (radians)

White 0.39 0.0014Red 0.76 0.0021Edge 1.82 0.0025Mixed 0.51 0.0014

OVERALL MEAN 0.87 0.0019

102


Figure 5.9: Pendulum video images – Whitebackground

Figure 5.10: Pendulum video images – Back-ground with red distracters.

103


Figure 5.11: Pendulum video images – Back-ground with edge noise.

Figure 5.12: Pendulum video images – Mix ofred distracters and edge noise.

104


The regression residuals for the extracted symmetry lines are very small across all fourtest videos. The mean error of R is less than 1 pixel and the mean θ error is roughly0.1 degrees. These low residuals suggests that the proposed marker-based method ofextracting symmetry lines is capable of providing reliable ground truth data. The smallresiduals indicate that the extracted symmetry lines behave according to the dampedpendulum described by Equations 5.2 and 5.3. More importantly, the regression resultsimply that the ground truth symmetry lines are good estimates of the object’s actualsymmetry line.

5.4.2 Quantitative Analysis of Tracking Accuracy

After establishing a way to obtain reliable ground truth data, quantitative examinationof the symmetry tracker can proceed. Four test videos, each containing 1000 images, areused to thoroughly evaluate the symmetry tracker’s accuracy. The test videos each showaround 10 oscillations of the pendulum in front of different static backgrounds. Exampleimages from the four test videos are displayed in Figures 5.9 to 5.12

The following four backgrounds are used in the test videos. In the first test video, awhite background is used as a control experiment to examine the tracker under near-idealbackground conditions. Example images from the first test video are shown in Figure 5.9.However, specular reflections and shadows are still quite prominent for some object poses.Red distracters, similar in colour to the tracked object, are added to the background inthe second test video as can be seen in Figure 5.10. To increase input edge noise to thesymmetry detector, high-contrast line features are present in the background of the thirdvideo. Images from this video can be found in Figure 5.11.The fourth video contains bothred distracters and edge noise in the background as shown in Figure 5.12

To measure tracking accuracy, the tracker result is compared against ground truth foreach frame of a video. As the main concern is to investigate the performance of detectedsymmetry as a tracking feature, the focus is on looking at the quality of measurementsprovided to the Kalman filter. Also, the Kalman filter motion model may provide unevenreductions in tracking error during the swing. Therefore, Kalman filtering is not appliedduring the tracking accuracy trials. However, to simulate actual tracking operation, thetracker continues to use the motion mask to reject edge pixels from static portions ofthe video image before applying symmetry detection. As little motion is experienced atthe extremities of the pendulum swing, motion masking cannot be applied to these videoimages. To remove any bias introduced by the lack of motion masking, the accuracyanalysis ignores video images within five frames of the pendulum’s turning point.

Table 5.3 provides a statistical summary of the pendulum symmetry tracking error. Thetable columns from left to right are the mean of absolute errors, standard deviation oferrors and the median of absolute errors. The difference between the polar parameters ofthe tracker’s result and ground truth data is used as the error measure. Error plots for Rand θ are shown in Figures 5.14 to 5.17. The tracking errors are coloured blue. Due to the

105


length and cyclic nature of the test videos, each error plot only shows the first 400 imagesof a test video. The mean-subtracted ground truth data is plotted as a black dotted lineagainst a different vertical axis for visualization purposes. Histograms of tracking errorsare shown in Figures 5.18 to 5.21.

Table 5.3: Pendulum symmetry tracking error statisticsVideo background Polar parameter Abs. mean Std. Abs. median

WhiteR (pixels) 1.1256 1.1350 0.5675θ (radians) 0.0057 0.0048 0.0043

RedR (pixels) 2.0550 1.7955 2.8732θ (radians) 0.0134 0.0110 0.0129

EdgeR (pixels) 1.2118 1.0529 0.8765θ (radians) 0.0078 0.0053 0.0079

MixedR (pixels) 3.4147 1.6186 7.4565θ (radians) 0.0192 0.0099 0.0375

White Background

The error plots for the white background video are contained in Figure 5.14. Exampleimages from the video are shown below the two error plots. The left histogram in Fig-ure 5.18 shows a small DC offset in the radius error. From qualitative examination of theedge images, this offset appears to be caused by a small drift in edge pixel locations dueto uneven lighting and object motion blur.

The θ error plot shows that the orientation error tend to increase in magnitude near groundtruth zero crossings. This increase in tracking error appears to be correlated with objectspeed, which is fastest near the middle of the pendulum swing where θ = 0. Higher objectvelocities increases motion blur. This introduces noise into the locations of detected edgepixels thereby also reducing the accuracy of detected symmetry.

The error plots show that both radius and θ errors are small, despite specular reflectionson the right side of the bottle and fairly strong background shadows. The application ofa temporal filter, such as the Kalman filter employed in the proposed tracker, will furtherreduce tracking errors. The low-error tracking results suggest that the fast symmetrydetection method provides accurate measurements to a tracker when the tracked object isset against a plain background.

Background with Red Distracters

The error plots in Figure 5.15 show that a background littered with distracters, similar incolour to the tracked object, increases tracking errors. The increase in error appear to bedue to missing edge pixels along the object’s contour. These missing edges are caused bythe lack of intensity contrast between the bottle and the red background distracters. Thereduction in object-background contrast also reduces the size of the motion mask, causing

106


some edge pixels in the tracked object’s contour to be rejected. Again, the errors appearto be greater in magnitude near ground truth zero crossings, where the object is movingat a greater speed.

Overall, the symmetry tracking errors are still very small. In the radius error plot, themagnitude of the error rarely exceeds 4 pixels, which is less than 1 percent of the imagewidth. The two peaks in the θ error plot are roughly 3 degrees in magnitude. Thehistograms in Figure 5.15 show that over the entire 1000 frames of the test video thereare a few errors of larger magnitude. Due to their low frequency of occurrence, temporalfilters, such as a Kalman filter, can easily deal with these error spikes. As such, it appearsthat even with similarly coloured distracters in the background, detected symmetry canbe used as a reliable source of measurements for the purpose of object tracking.

Background with Edge Noise

The error plots in Figure 5.16 suggest that an increase in input edge noise has little impacton tracking accuracy. Figure 5.13 contains an example of the motion-masked edge pixelsgiven to the symmetry detector during the test video. Notice that despite the rejectionof static edge pixels using the motion mask, the remaining edge pixels are still very noisy.However, the fast symmetry detection method is still able to accurately recover the bottle’ssymmetry line.

The histograms in Figure 5.16 confirm that background edge noise has little impact ontracking accuracy. Comparing error magnitudes, it appears that background distracters ofsimilar colour to the tracked object deteriorates tracking accuracy more than backgroundedge noise. This observation implies that missing edge pixels, caused by low object-background contrast, are more harmful to symmetry detection than the addition of randomedge pixel noise.

Mixed Background

The error plots for the final pendulum test video are shown in Figure 5.17. The examplevideo frames show that the background contains a combination of high-contrast edgesand red distracters. The error plots contain several large spikes near zero crossings ofthe ground truth data. These error spikes are due to a combination of motion blur andmissing edge pixels caused by the low object-background contrast provided by the redbackground distracters. The tracking results presented in Section 5.3, especially videoswith low object-background contrast such as those involving transparent objects, suggeststhat the Kalman filter is able to handle these sparse error spikes.

107


(a) Input edge pixels

(b) Detected symmetry line

Figure 5.13: Example symmetry detection result from test video with background edge pixel noise.The top image shows the motion-masked edge pixels given to the fast symmetry detector as input.In the bottom image, the symmetry line returned by the detector is shown in blue.

108


0 50 100 150 200 250 300 350 400-4

-3

-2

-1

0

1

2

3

4

Ra

diu

s E

rro

r (p

ixe

ls)

Frame Number

White Background: Fast Symmetry Radius Error

0 50 100 150 200 250 300 350 400-200

-150

-100

-50

0

50

100

150

200

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th R

ad

ius (

pix

els

)

Symmetry Radius ErrorGround Truth

0 50 100 150 200 250 300 350 400-0.03

-0.02

-0.01

0

0.01

0.02

0.03

s E

rro

r (r

ad

ian

s)

Frame Number

White Background: Fast Symmetry s Error

0 50 100 150 200 250 300 350 400-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th s

(ra

dia

ns)

Symmetry s ErrorGround Truth

Figure 5.14: White background – Symmetry tracking error plots.

109


0 50 100 150 200 250 300 350 400-8

-6

-4

-2

0

2

4

6

8

Ra

diu

s E

rro

r (p

ixe

ls)

Frame Number

Background with Red Distractors: Fast Symmetry Radius Error

0 50 100 150 200 250 300 350 400-200

-150

-100

-50

0

50

100

150

200

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th R

ad

ius (

pix

els

)


0 50 100 150 200 250 300 350 400-0.06

-0.04

-0.02

0

0.02

0.04

0.06

s E

rro

r (r

ad

ian

s)

Frame Number

Background with Red Distractors: Fast Symmetry s Error

0 50 100 150 200 250 300 350 400-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th s

(ra

dia

ns)


Figure 5.15: Background with red distracters – Symmetry tracking error plots.

110


0 50 100 150 200 250 300 350 400-6

-4

-2

0

2

4

6

Ra

diu

s E

rro

r (p

ixe

ls)

Frame Number

Background with Edge Noise: Fast Symmetry Radius Error

0 50 100 150 200 250 300 350 400-200

-150

-100

-50

0

50

100

150

200

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th R

ad

ius (

pix

els

)


0 50 100 150 200 250 300 350 400-0.06

-0.04

-0.02

0

0.02

0.04

0.06

s E

rro

r (r

ad

ian

s)

Frame Number

Background with Edge Noise: Fast Symmetry s Error

0 50 100 150 200 250 300 350 400-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th s

(ra

dia

ns)


Figure 5.16: Background with edge noise – Symmetry tracking error plots.

111


0 50 100 150 200 250 300 350 400-80

-60

-40

-20

0

20

40

60

80

Ra

diu

s E

rro

r (p

ixe

ls)

Frame Number

Mixed Background: Fast Symmetry Radius Error

0 50 100 150 200 250 300 350 400-200

-150

-100

-50

0

50

100

150

200

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th R

ad

ius (

pix

els

)


0 50 100 150 200 250 300 350 400-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

s E

rro

r (r

ad

ian

s)

Frame Number

Mixed Background: Fast Symmetry s Error

0 50 100 150 200 250 300 350 400-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th s

(ra

dia

ns)


Figure 5.17: Mixed background – Symmetry tracking error plots.

112


−3 −2 −1 0 1 2 30

10

20

30

40

50

60

70

80

90

100

R Error (pixels)

His

togra

m C

ount

(a) R Error (pixels)

−0.06 −0.04 −0.02 0 0.02 0.04 0.060

10

20

30

40

50

60

70

80

90

θ Error (radians)

His

togra

m C

ount

(b) θ Error (radians)

Figure 5.18: White background – Histograms of symmetry tracking errors.

−80 −60 −40 −20 0 20 40 60 800

50

100

150

200

250

300

R Error (pixels)

His

togra

m C

ount


−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.80

50

100

150

200

250

θ Error (radians)

His

togra

m C

ount


Figure 5.19: Background with red distracters – Histograms of symmetry tracking errors.

−8 −6 −4 −2 0 2 4 6 80

10

20

30

40

50

60

70

80

90

R Error (pixels)

His

togra

m C

ount


−0.06 −0.04 −0.02 0 0.02 0.04 0.060

50

100

150

θ Error (radians)

His

togra

m C

ount


Figure 5.20: Background with edge noise – Histograms of symmetry tracking errors.

−100 −50 0 50 1000

100

200

300

400

500

600

R Error (pixels)

His

togra

m C

ount


−1.5 −1 −0.5 0 0.5 1 1.50

50

100

150

200

250

300

350

400

450

θ Error (radians)

His

togra

m C

ount


Figure 5.21: Mixed background – Histograms of symmetry tracking errors.

113


5.4.3 Qualitative Comparison Between Symmetry and Colour

Colour is regularly used to perform object tracking. For example, the back projectionof a hue histogram is used in Camshift [Bradski, 1998] to perform human face tracking.The same approach can also be used to track inanimate objects such as tennis balls anduniformly coloured cups. As discussed previously, the edge pixels that the fast symmetrydetector uses as input data are visually orthogonal to value-based features such as colour.In this section, an attempt is made to compare the performance of bilateral symmetry andcolour as tracking features.

Colour Blob Centroid

Before proceeding further, a colour-based feature must be chosen for the comparisonagainst bilateral symmetry. As a symmetry line provides object pose information, thecolour blob centroid seems to be a valid colour-based counterpart, as it also provides poseinformation. The colour blob centroid is extracted by finding the center of volume (zerothmoment) of a similarly coloured blob of pixels. Note however that the pose informationprovided by a colour centroid is different to that provided by a symmetry line. A colourcentroid provides an (x, y) pixel location of an object’s center of volume. A detected sym-metry line provides object orientation and constrains the object location to the loci alongthe symmetry line. As the test object is symmetric, tracking accuracy is skewed favourablytowards symmetry. As such, only a qualitative comparison is performed between the colourand symmetry as tracking features.

The colour tracker uses the hue-saturation-value (HSV) colour space to represent pixelinformation. The target object’s colour model is represented as a two-dimensional his-togram, which stores hue and saturation information. The two-dimensional histogramquantizes hue and saturation into 45 and 8 bins respectively. The value component ofHSV is only used to reject pixels that are very dark or very bright, which have unreliablehue information. The object’s colour histogram is built offline and manually optimizedprior to colour blob centroid detection. This is different from using symmetry as an objecttracking feature, which requires no manual initialization.

The back projection image is obtained using the tracked object’s hue-saturation histogram,according to the method described in [Swain and Ballard, 1991]. The back projectionimage represents the probability that a pixel in the input image belongs to the target objectbased on the pixel’s colour. To provide a fair comparison against symmetry tracking, thecolour tracker also uses the motion mask provided by the block motion detector to zerostatic portions of the back projection image. As with the symmetry tracking experiments,video images near the turning points of the pendulum are ignored. An example backprojection result is shown in Figure 5.22. In the back projection image, darker pixelsrepresent higher object probability.

114


(a) Hue-saturation histogram of object

(b) Input

(c) Back projection

Figure 5.22: Hue-saturation histogram back projection. In the back projection image, darkerpixels have a high probability of belonging to the object according to the hue-saturation histogram.

115


(a) White background

(b) Red distracters in background

Figure 5.23: Effects of different backgrounds on colour centroid. The back projection object blobis shown in yellow and the centroid is shown as a black dot.

116


A binary image is produced by thresholding the back projection image. The largest 8-connected blob is labelled as belonging to the target object. The colour blob centroidis found by calculating the zeroth moment, which is simply the center of volume of thepixels belonging to the object blob. Example object blobs are shown in Figure 5.23. Thecentroid location is shown as a black dot. In Figure 5.23(b), the large gap in the objectblob is due to low object-background contrast, which causes gaps in the motion mask andsubsequently the object blob.

Error Metric for Colour Tracking

The ideal error measure is the distance between the detected centroid and a ground truthcentroid location. However, ground truth requires manual segmentation of the object in allvideo images, including images where the test object is in front of other red objects. As thetest object is symmetric, its colour centroid is located on its symmetry line. Therefore,a low centroid to symmetry line distance implies accurate centroid detection. As thetracking errors are used for a qualitative comparison, this sub-optimal metric is sufficient.

Error plots of the colour blob centroid are available from Figures 5.24 to 5.27. The mean-subtracted radius of the ground truth symmetry line is provided as a visual reference ofthe pendulum’s motion. As the radius parameter is measured from the image center, zerocrossings of the dotted line represent the middle of the pendulum swing, where the objecthas the highest speed.

Comparison Results

Figures 5.24 and 5.26 suggest that centroid detection is accurate and reliable for the whitebackground and the noisy edge background test videos. The error plots suggest that colourtracking is reliable and accurate for these two videos. In Figure 5.26, the lack of jumpsin error magnitude during ground truth zero crossings agrees with expectations as thecolour blob centroid does not rely on edge information. While edge information becomesless reliable due to motion blur when the object speed is high, hue and saturation are noteffected by motion blur. Note that the cyclic nature of the centroid error in Figure 5.24is caused by a flip in the sign of the ground truth symmetry line’s radius parameter. Theabsolute magnitude of error is fairly stable across the entire plot.

The error plot for the video with red background distracters is shown in Figure 5.25.The plot shows a mean centroid error that is 4 to 5 times larger than that of the whitebackground video. This decrease in tracking accuracy is due to the background distractersdistorting the shape of the object blob, an example of which can be seen in Figure 5.23(b).The distortion is caused by the inability of the hue-saturation histogram to distinguishbetween object and background. The red distracters also caused gaps in the motion maskdue to low object-background contrast, which further increases the centroid tracking error.

117


0 50 100 150 200 250 300 350 400-4

-3

-2

-1

0

1

2

3

4

Ce

ntr

oid

Dis

pla

ce

me

nt

Err

or

s(p

ixe

ls)

Frame Number

White Background: HSV Tracking Centroid Displacement Error

0 50 100 150 200 250 300 350 400-200

-150

-100

-50

0

50

100

150

200

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th R

ad

ius (

pix

els

)

Symmetry ErrorGround Truth

Figure 5.24: White background – Colour blob centroid tracking error plot.

0 50 100 150 200 250 300 350 400-20

-15

-10

-5

0

5

10

15

20

Ce

ntr

oid

Dis

pla

ce

me

nt

Err

or

s(p

ixe

ls)

Frame Number

Background with Red Distractors: HSV Tracking Centroid Displacement Error

0 50 100 150 200 250 300 350 400-200

-150

-100

-50

0

50

100

150

200

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th R

ad

ius (

pix

els

)


Figure 5.25: Background with red distracters – Colour blob centroid tracking error plot.

118


0 50 100 150 200 250 300 350 400-3

-2

-1

0

1

2

3

Ce

ntr

oid

Dis

pla

ce

me

nt

Err

or

s(p

ixe

ls)

Frame Number

Background with Edge Noise: HSV Tracking Centroid Displacement Error

0 50 100 150 200 250 300 350 400-200

-150

-100

-50

0

50

100

150

200

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th R

ad

ius (

pix

els

)


Figure 5.26: Background with edge noise – Colour blob centroid tracking error plot.

0 50 100 150 200 250 300 350 400-20

-15

-10

-5

0

5

10

15

20

Ce

ntr

oid

Dis

pla

ce

me

nt

Err

or

s(p

ixe

ls)

Frame Number

Mixed Background: HSV Tracking Centroid Displacement Error

0 50 100 150 200 250 300 350 400-200

-150

-100

-50

0

50

100

150

200

Me

an

-Su

btr

acte

d G

rou

nd

Tru

th R

ad

ius (

pix

els

)


Figure 5.27: Mixed background – Colour blob centroid tracking error plot.

119


The detrimental effects of background distracters is further confirmed by the error plotsof the mixed background video in Figure 5.27. The error plots show periodic increases inerror magnitude. The uneven nature of these noise cycles is due to the object swingingover red distracters in the background, causing a shift in the location of the colour blobcentroid.

Comparing the colour tracking error plots against those of the symmetry tracker, it isclear that symmetry and colour each have their own strengths and weaknesses as trackingfeatures. Each feature should only be applied to tracking after carefully considering thevisual characteristics of the target objects, the background and the requirements of higherlevel methods making use of the tracking results. For example, colour tracking does notassume object rigidity, which allows the tracking of deformable objects. Also, colourtracking is less susceptible to motion blur than symmetry.

On the other hand, the comparison shows that colour blob centroids are difficult to employin situations where portions of the background have similar colour to the tracking target.Also, colour tracking requires a reasonably accurate histogram of the target’s colour statis-tics, which maybe difficult to obtain automatically. Colour models of all possible targetobjects are needed before tracking. Moreover, transparent and reflective objects take onthe colour of their surroundings, which makes colour a poor choice as a tracking featurefor these objects.

5.5 Chapter Summary

This chapter detailed a novel symmetry-based approach to object tracking. Experimentson ten challenging test videos show that the proposed method is able to robustly andrapidly track objects, operating at over 40 frames per second on 640 × 480 images. Theexperimental results also show that convergent tracking can be maintained for multi-colourobjects and transparent objects. The symmetry tracker successfully dealt with difficultsituations such as large object pose changes, occlusions and the presence of non-targetsymmetric objects. The tracker also produces a symmetry-refined motion segmentation ofthe tracked object in real-time.

The discussion at the end of the previous chapter mentioned the problem of distinguish-ing between object and background symmetry lines. Due to the lack of object models,this is difficult without relying on prior knowledge such as table plane geometry or theexpected orientation of a target object’s symmetry line. This chapter has shown thatmotion is a useful visual cue for separating an object’s symmetry line from static back-ground symmetries. In the next chapter, the visual sensing repertoire developed so faris combined synergetically with robotic manipulation to actuate new objects in order toperform motion segmentation autonomously.

120

Vision without action is a dream. Action without visionis simply passing the time. Action with vision is makinga positive difference.

Joel Barker 6Autonomous Object Segmentation

6.1 Introduction

This chapter details a robust and accurate object segmentation method for near-symmetricobjects placed on a table of known geometry. Here, object segmentation is defined as theproblem of isolating all portions of an image that belongs to a physically coherent object.The term near-symmetric is used as the proposed method can segment objects with somenon-symmetric parts, such as a coffee mug and its handle. The proposed approach doesnot require prior models of target objects and assumes no previously collected backgroundstatistics. Instead, the approach relies on a precise robotic nudge to generate predictableobject motion. Object motion and symmetry are combined to produce an accurate seg-mentation. The use of physical manipulation provides a completely autonomous solutionto the problem of object segmentation. Experiments show that the resulting autonomousrobotic system produces accurate segmentations, even when operating in cluttered sceneson multi-colour and transparent objects. A paper detailing the work in this chapter isavailable in press [Li and Kleeman, 2008].

6.1.1 Motivation

The work presented in this chapter is intended for use in domestic robotics applications asthere are many objects with bilateral symmetry in most households. However, the sensingparts of the process, namely locating points of interest using symmetry triangulationand the symmetry-guided motion segmentation method, are applicable to other robotictasks. The overall aim is to provide the robot with a general method to detect andsegment common household objects such as cups, bottles and cans, without the burdenof mandatory offline training for every new object. As the proposed approach assumesnothing about the appearance of the robot manipulator, the actuation of target objectscan be provided by any manipulator capable of performing a robotic nudge as describedin Section 6.3.

121

CHAPTER 6. AUTONOMOUS OBJECT SEGMENTATION

Object segmentation is an important sensory process for robots using vision. It allows arobot to build accurate internal models of its surroundings by isolating regions of an imagethat belong to objects in the real world. For domestic robots, the ability to quickly androbustly segment man-made objects is highly desirable. For example, a robot designedto clean and tidy desks will need to locate and segment common objects such as cups.Section 4.2.4 highlighted several limitations of symmetry-based static object segmentation.The proposed model-free approach was unable to recover asymmetric portions of near-symmetric objects, such as the handle of a mug. Also, segmentation results may notencompass the entire object due to distortions and gaps in the object’s edge contour.

The problems above can be solved passively by using a high level method and objectmodels, essentially converting the problem to one of object recognition. Robust multi-scaleobject recognition methods, such as SIFT [Lowe, 2004] and Haar boosted cascades [Violaand Jones, 2001] can imbue a robot with the ability to recognize previously modelledobjects. However, training such methods requires large quantities of positive and negativesample images for each object. The number of training images are in the order of athousand or greater if high accuracy is required. Manually labelling and segmentingimages to produce training data for such methods is time consuming, especially for largesets of objects. Considering the large number of objects that are present in most real worldenvironments, exhaustive training is impossible. Also, the introduction of novel objectsinto the operating environment will require offline training before the resumption of visualsensing.

The segmentation process described in this chapter attempts to address these problems byobtaining accurate object segmentations autonomously. This shifts the burden of trainingdata collection from the human user to the robot. Also, the ability to detect and segmentobjects quickly, without having to pause for offline training, allows the robot to rapidlyadapt to changing environments. Departing from the norm, a model-free approach tovisual sensing is retained by physically manipulating objects using a precise robotic nudge.Returning to the desk cleaning robot example, in the case where it encounters a cup lackinga model, the robot will generate its own training data by nudging the cup over a shortdistance. In order to do this, the robot uses the arsenal of model-free sensing methodsdetailed in previous chapters.

6.1.2 Contributions

The sensing of objects using physical manipulation has been explored several times in thepast. Tactile approaches use force sensors on the end effector to obtain information aboutan object’s shape during a probing action [Cole and Yap, 1987] or a pushing motion [Jia andErdmann, 1998; Moll and Erdmann, 2001]. The majority of active vision methods focuson moving the camera to obtain multiple views of the scene. Departing from the norm,the work of Fitzpatrick et al. [Fitzpatrick, 2003a; Fitzpatrick and Metta, 2003] uses thevisual feedback during physical manipulation to investigate objects. Their approach uses

122

Section 6.1. Introduction

a poking action, which sweeps the end effector across the workspace. The presence of anobject is detected by monitoring the scene visually, looking for a sharp jump in the amountdetected motion that occurs when the robot’s end effector makes contact with an object.When the effector-object collision is detected, object segmentation is performed usinga graph cuts approach. This section details the contributions of the proposed approachtowards active vision. The proposed approach is also compared against that of Fitzpatricket al., which partly inspired the work in this chapter.

Detecting Interesting Locations before Object Manipulation

By limiting the scope to near-symmetric objects, interesting locations can be found priorto robotic action. Similar to Section 4.3.5, objects are localized by looking for the inter-section between each object’s triangulated axis of symmetry and the table plane. Theseintersection points are clustered over multiple frames to obtain a stable estimate of ob-ject locations. Due to the low likelihood of two background symmetry lines triangulatingto an axes that passes through the table plane within the robot manipulator’s reachableworkspace, the majority of clusters represent locations that will yield useful informationwhen physically explored. In essence, the interesting locations provides the robot with aset of expectations to test using physical manipulation.

Finding interesting locations prior to robotic action provides several advantages. Firstly,the detected interesting locations provide a subset of workspace areas that have highprobability of containing an object. This is different from the approach of Fitzpatricket al., which assumes that all parts of the unexplored workspace have equal probabilityof containing an object. For scenes with sparse spatial distribution of objects, the pre-action planning provided by the sensing of interesting locations will drastically reduceexploration time. Secondly, interesting locations allow the use of exploration strategies.For example, locations near the camera can be explored first, as they are less likely tobe occluded by other objects. In combination with dense stereo, which uses visuallyorthogonal information to stereo symmetry triangulation, obstacles on the table top shouldbe identifiable. This will allow path planning with obstacle avoidance prior to roboticaction.

Note that the author is not claiming superiority over the approach of Fitzpatrick et al.Their object exploration method is designed to be highly general for the purposes ofinvestigating the visual-motor response and measuring object affordances. Fitzpatrick’sPhD thesis [Fitzpatrick, 2003b] also makes use of the same exploration method to measureobject affordances. Due to the lack of constraints on object shape imposed by theirmethod, no pre-action sensing can be performed to identify locations that are likely tocontain objects.

123


Precise Manipulation of Objects

Experiments show that the robotic nudge action does not tip over tall objects such asempty bottles. The robotic nudge also does not damage fragile objects such as ceramicmugs. This level of gentleness in object manipulation is not demonstrated in the workof Fitzpatrick et al., which uses durable test objects such as toy cars and rubber balls.Also, their poking action actuates objects at a relatively high speed. This is probablydue to their use of graph cuts segmentation at the moment of effector-object contact,which requires a significant amount of object motion between time-adjacent video images.The proposed segmentation approach does not rely on fast object motion, instead relyingon the small object displacement caused by the robotic nudge. As such, the speed of therobotic nudge can be altered depending on the weight and fragility of the expected objects.

Due to the elastic actuators in the manipulator used by Fitzpatrick et al., their pokingaction is inherently imprecise. In contrast, the proposed method uses a short and accu-rate robotic nudge, only applied at locations of interest. While neither method directlyaddresses the problem of end effector obstacle avoidance, the small workspace footprintof the robotic nudge will make path planning easier. Also, the short length of the roboticnudge allows for greater ease in path planning. As the robotic manipulator uses tradi-tional DC motors and high resolution encoders, the robotic nudge motion is inherentlyvery precise. This allows the robotic nudge to be applied in cluttered scenes. The higherprobability of glancing blows, together with the imprecise execution of trajectories whenusing elastic actuators, suggests that the poking approach of Fitzpatrick et al. is lesssuitable for cluttered environments.

Prevention of Poor Segmentations

The proposed segmentation approach incorporates several tests to prevent poor segmen-tation. Figure 6.2 contains a flowchart representing the segmentation process. The dia-mond shaped blocks represent stages where segmentation can be restarted if undesirablesituations occur. Firstly, if the target location is not nudgable, the robot restarts its seg-mentation process. A location is nudgable if it is within the robot manipulator’s reachableworkspace and there is sufficient clearance to nudge the target object without any collisionswith other objects.

Secondly, the robot looks for object motion during the nudge. If no object motion isdetected during the robotic nudge, the segmentation process is restarted. This ensuresthat a segmentation is not attempted if the robotic nudge does not cause sufficient objectmotion. The motion detection step also prevents the segmentation of phantom objectscaused by the triangulation of two background symmetry lines. As such, the roboticnudge simultaneously tests for the existence of an object at the target location and allowsmotion segmentation if an object is present. Unlike Fitzpatrick et al., a sharp jump in scenemotion is not used as an indicator for a moving object. Instead, motion is detected only

124


at the location being nudged, using the target object’s symmetry line as a barrier. Thisis possible because the robotic nudge is a planned motion targeting a specific location,allowing robust rejection of robot arm motion. The proposed object motion detectionmethod is further detailed in Section 6.3.2.

Thirdly, the robot tracks the object’s symmetry line in stereo. If either tracker diverges, therobot abandons the current segmentation attempt and the process is restarted. Also, thesegmentation process is restarted if the orientation of the object’s symmetry line changesdramatically before and after the robotic nudge. These checks ensure that segmentation isnot performed if an object is tipped over or pushed behind an occlusion. As the approachof Fitzpatrick et al. does not forecast the time and location of object-manipulator contact,they do not apply any prevention schemes against insufficient object motion. This canresult in near-empty segmentations due to robot arm motion being interpreted as objectmotion. Also, the robot arm is sometimes included in their object segmentation result.

Robust Segmentation using Symmetry and Motion

While appearing similar at a glance, the proposed approach to visual segmentation is verydifferent to that of Fitzpatrick et al. Their approach uses video frames during roboticaction, around the time of contact between the end effector and object. Due to theirmotion jump initiation, unlucky frame timing with respect to the time of contact canproduce poor segmentations. Several incorrect segmentations are highlighted in Figure 11of [Fitzpatrick and Metta, 2003], which shows the inclusion of the end effector in somesegmentation results. Also, near-empty segmentations can be returned by their approach.

These problems never occur in the proposed approach as the video images used for motionsegmentation are temporally further apart, captured before and after the nudge but neverduring a robotic action. This explicitly prevents the inclusion of the robot’s end effectorin the motion segmentation results. The motion detection approach also prevents theproduction of near-empty segmentations. While not explicitly documented, it seems thattheir choice of motion threshold will depend on object size and the speed of impact. Asthe proposed approach only looks for any object motion, not a change in the quantity ofmotion, the motion threshold is fairly arbitrary.

The proposed object segmentation approach is novel. The use of symmetry combinedwith robot-generated object motion produces accurate object segmentations robustly andautonomously. This model-free approach is able to segment multi-colour and transparentobjects in clutter. High resolution 1280 × 960 pixel images are used to produce moreaccurate segmentations. These images each has four times the pixel data of the 640× 480images used in the static segmentation experiments presented in Section 4.2. However, asthe proposed segmentation method only requires a single pass and uses low computationalcost operations, the robot is able to obtain segmentations very rapidly. Timing trialsconducted on a 1.73GHz Pentium M laptop measured a mean execution time of 80msover 1000 segmentations, which includes time taken to calculate the frame difference.

125


The segmentation’s execution time is much lower than the many seconds required by thegraph cuts method used in the system of Fitzpatrick et al. This suggests that the proposedsymmetry-based segmentation approach is well suited for time-critical applications wherelow execution times are essential.

6.1.3 System Overview

The hardware components of the robot are shown in Figure 6.1. The stereo camerasconsists of two Videre Design IEEE1394 CMOS cameras verged together at around 15degrees from parallel. These cameras capture 640 × 480 images at 25Hz during all partsof the segmentation process, except for high resolution 1280× 960 snapshots of the scenetaken before and after the robotic nudge. The cameras are mounted on a tripod andpositioned to emulate the eye-manipulator geometric relationship of a humanoid robot.

Figure 6.1: Robotic system hardware components.

The PUMA 260 robot manipulator has six degrees of freedom. The PUMA arm and stereocameras are setup to roughly simulate the manipulator-camera relationship of a humanoidrobot. Note that the PUMA arm is driven using a custom-built controller so that CPU

126


processing can be focused towards visual sensing. Details of the new controller are availablefrom Appendix B. The calibration grid is used to perform camera-arm calibration and toestimate the geometry of the table plane. Details of the calibration and table planeestimation are described in Section 6.1.4.

The autonomous segmentation process is summarized in Figure 6.2. The robot beginsby surveying the scene for interesting locations to explore. The details of this processare described in Section 6.2. Once an interesting location has been found, the robotmanipulator nudges the target location. If motion is detected during the nudge, stereotracking is initiated to keep track of the moving object. Section 6.3 describes the roboticnudge and stereo tracking. If tracking converges, the object is segmented using the methoddescribed in Section 6.4. As discussed in previous chapters, bilateral symmetry is thebackbone feature of the visual processing.

Find Interesting

Locations

Location

Nudgable?

Location

nearest CameraNo

Nudge Location

Tracking

Successful?

Segment Object

Yes

Motion

Detected?

No

Stereo Tracking

Yes

No

Yes

No

START

RESULT

Figure 6.2: Autonomous object segmentation flowchart.

127


Note that while the flowchart shows a series progression along the segmentation process,several parallel threads of execution take place in practice. Firstly, two threads returnvideo images from the stereo camera pair in real time at 25 frames per second. Thevisual processing to detect interesting locations and perform stereo tracking takes placein parallel, alongside the camera threads. During the nudge, the robot arm controllercommands the manipulator in parallel with the visual processing and the camera threads,making sure that the planned motion trajectories are being followed. This control threadexecutes on its own processing unit on a PCI servo controller card. The controller cardcommands the arm motors and monitors the joint encoders in order to execute smoothtrajectories.

6.1.4 System Calibration

The robot requires two pieces of prior knowledge in order to perform autonomous seg-mentation. Firstly, the robot requires the geometry of the table plane. This allows therobot the ability to move its end effector parallel to the table when performing the nudge.Secondly, the robot needs the geometric relationship between its stereo cameras and itsmanipulator. This allows object locations obtained via visual sensing to be used for roboticmanipulation. Both pieces of prior knowledge are obtained during system calibration. Asthe robot manipulator is mounted on the table, calibration only needs to be repeated ifthe stereo cameras’ pose changes relative to the table.

System calibration is performed as follows. Firstly, the stereo cameras are calibrated usingthe MATLAB camera calibration toolbox [Bouguet, 2006]. The intrinsic parameters ofeach camera in the stereo pair are obtained individually. This is followed by a separatecalibration to obtain the extrinsics of the stereo system. After calibrating the stereocamera pair, the robot can triangulate locations in 3D relative to the camera’s coordinateframe. The geometry of the table is found by fitting a plane to the corners of the checkerpattern, the locations of which are found using standard stereo triangulation.

To find the manipulator-camera transformation, the robot uses the calibration checker-board pattern on the table. The corners of the checkerboard pattern are placed at aknown location in the robot manipulator’s coordinate frame. This is achieved by draw-ing a grid of points on the table using a marker pen end effector with the manipulator.The checkerboard pattern is then placed on the table with its corners aligned with themanipulator-drawn grid points. The corners of the checkerboard are then triangulated us-ing the stereo camera pair, obtaining their locations in the camera coordinate frame. Themanipulator-camera frame transformation is then found by solving the absolute orienta-tion problem, which returns the transformation that will map points between the cameraand manipulator frames. The PCA solution proposed by [K. S. Arun and Blostein, 1987]

is used to obtain the transformation.

128

Section 6.2. Detecting Interesting Locations

6.2 Detecting Interesting Locations

6.2.1 Collecting Symmetry Intersects

The process of detecting interesting locations begins with the collection of symmetryintersects across multiple video images. The use of multiple images ensures that thedetected interesting locations are temporally stable. Symmetry lines are detected in theleft and right camera image over 25 time-contiguous video images. As the stereo camerasrecord at 25 frames per second, data collection only requires one second.

All possible pairings of symmetry lines between the left and right images are triangulatedto form 3D axes of symmetry using the method described previously in Section 4.3.3. Inthe experiments, three symmetry lines are detected for each camera image, resulting in amaximum of nine triangulated axes of symmetry. Symmetry axes that are more than 10degrees from being perpendicular to the table plane are rejected. The intersection pointsbetween the remaining symmetry axes and the table plane are calculated. To prevent thedetection of interesting locations outside the robot’s reach, only symmetry intersectionswithin the workspace of the robot manipulator are collected for clustering. The clusteringmethod is given the 2D locations of the symmetry intersects on the table plane.

It is possible for two background symmetry lines to produce a valid symmetry axis throughtriangulation. In practice, the perpendicular-to-table constraint rejects the majority ofnon-object symmetry axes. The limited workspace of the robot manipulator also implicitlyremoves many background symmetry axes. In the rare case where a phantom symmetryaxis results in the detection of an interesting location, the robotic nudge will quicklyconfirm that the location is empty. Note that the robot will not attempt to segment anempty location as no object motion will be induced by the robotic nudge.

6.2.2 Clustering Symmetry Intersects

The intersections between valid symmetry axes and the table plane are collected over 25pairs of video frames and recorded as 2D locations on the table plane. These intersectlocations are grouped into clusters using the QT algorithm [Laurie J. Heyer and Yooseph,1999], which has been modified to deal with 2D input data. The QT algorithm works bylooking for clusters that satisfy a quality threshold. Unlike K-means clustering, the QTclustering algorithm does not require any prior knowledge of the number of actual clusters.This is important as it frees the robot from making an assumption about the number ofobjects on the table.

Several modifications are made to the QT algorithm. Firstly, the recursion in the algorithmis removed to reduce its computational cost. The original QT method is designed forclustering genetic information where an upper limit for the number of clusters is unknown.However, as the maximum number of object symmetry lines the robot can detect is limitedto three per video image, the recursion is replaced with loop iterations. Secondly, a

129


minimum cluster size threshold is used to reject temporally unstable symmetry intersects.As the maximum number of symmetry axes per stereo image pair is nine, a temporallystable symmetry axis will contribute at least 1

9 of the intersects used in clustering. Assuch, the minimum cluster size is set to one-ninth of the total number of intersects.

The QT clustering is performed as follows. The algorithm iterates through each symmetryintersect, adding all other intersects within a threshold distance. The cluster with the mostnumber of intersects is returned. The algorithm repeats on the remaining intersects untilthe required number of clusters have been returned. In the experiments, the clusters arelimited to 10mm in radius. This prevents the formation of clusters that include symmetryintersects from multiple objects. The geometric centroids of the clusters are used by therobot as a list of interesting locations to explore. A nudge is performed on the validlocation closest to the right camera. A location is deemed invalid if the end effector willcollide with other interesting locations during the robotic nudge.

6.3 The Robotic Nudge

After selecting an interesting location on the table to explore, a robotic nudge is appliedto the target location. The robotic nudge is used to detect the presence of an object aswell as to generate the necessary object motion to perform segmentation. The roboticnudge is designed to actuate objects across the table in a controlled manner. It is able tomove fragile objects such as ceramic mugs as well as objects with high center of gravitysuch as empty beverage bottles. Visual sensing occurs in parallel with the robotic nudge.Once object motion is detected, stereo tracking is used to monitor the nudge to ensuresufficient and consistent object actuation.

6.3.1 Motion Control

The motion of the robot’s end effector during a nudge is shown in Figures 6.3 and 6.4.The L-shaped protrusion is made of sponge to provide damping during contact, which isespecially important when nudging brittle objects such as ceramic cups. The L-shapedsponge also allows object contact to occur very close to the table plane. By applying forceto the bottom of objects, nudged objects are less likely to tip over.

Figure 6.3 shows the side view of the robotic nudge motion. The nudge begins by loweringthe gripper from P0 to P1. The height of the gripper at location P0 is well above theheight of tallest expected object. Dmax is set to ensure that the L-shaped sponge will nothit the largest expected object during its descent. In the experiments, Dmax is set to allowat least 20mm of clearance.

After arriving at P1, the gripper travels towards P2. Dmin is selected such that the gripperwill make contact with the smallest expected object before arriving at P2. Dmin is setto 10mm in the experiments. The gripper then retreats back through P1 to P0. In early

130

Section 6.3. The Robotic Nudge

P0

P2

Target

Object

Symmetry

Line

P1

Table

Dmin

Dmax

Detect

Object

Motion

Here

Figure 6.3: The robotic nudge – Side view.

tests, the gripper was moved directly from P2 back to P0. This knocked over taperedobjects such as the blue cup in Figure 6.8(a) due to friction between the soft sponge andthe object’s outer surface. Note that the end effector never visually crosses an object’ssymmetry line during any part of the robotic nudge.

An overhead view of the nudge is provided in Figure 6.4. The nudge motion is perpendic-ular to the line formed between the focal point of the right camera and the location beingnudged. If an object is present at the location, the robotic nudge will actuate the objecthorizontally across the camera’s image. This reduces the change in object scale causedby the nudge and also lowers the probability of glancing contact, improving the quality ofsegmentation.

P2

Symmetry

Line

P0,P1

Target Object

Right Camera

Hard

Sponge

Soft

Sponge

Hard FoamNudge Vector

Figure 6.4: The robotic nudge – Overhead view.

131


A robotic nudge captured by the right camera is shown in Figure 6.5. The end effectorappears mirrored in the video images because the diagram in Figure 6.3 is drawn fromthe experimenter’s point of view. Notice that the nudge only moves the transparentcup a short distance. This prevents large object pose changes that can negatively affectthe segmentation results. Also, the small workspace footprint of the nudge reduces theprobability of object collisions when the robot operates on cluttered scenes.

Figure 6.5: Consecutive right camera video images taken during the P1–P2–P1 portion of therobotic nudge.

After selecting an interesting location, P0, P1 and P2 are determined based on the camera’slocation. Using inverse kinematics, linearly-interpolated encoder values are generated atruntime to move the end effector smoothly between these three points. In the roboticexperiments, the above information is displayed to the user before the nudge. Figure 6.6shows an example of the nudge visualization provided to the user. The target location isshown as a blue circle, with the cluster centroid drawn as a blue dot. The motion of theplanned nudge is drawn as a green arrow. The camera location is shown as a red rectangleon the left of the image. The workspace of the robot manipulator is coloured white. Theradii of the inner and outer borders of the workspace are roughly 150mm and 350mm fromthe base joint of the robot arm.

132

Section 6.3. The Robotic Nudge

Figure 6.6: Workspace visualization of robotic nudge.

6.3.2 Obtaining Visual Feedback by Stereo Tracking

During the nudge, the robot monitors the right camera image for object motion. Motiondetection is performed using the block motion method described previously in Section 5.2.2.The robot only monitors for object motion in the green region in Figure 6.3. As the endeffector never crosses the symmetry line during the nudge, the ego motion of the robot willnot be detected as object motion. If no object motion is found, the entire segmentationprocess can be restarted on the next interesting location. The detection of object motionduring the nudge allows the robot to identify empty interesting locations, which preventsthe production of near-empty segmentations as encountered by Fitzpatrick et al. Similarly,poor segmentations due to insufficient object motion caused by glancing contact are alsoavoided.

Once object motion is detected, the robot begins tracking on the target object’s symme-try line. Tracking is performed in stereo using two independent symmetry trackers, onefor each camera. Each tracker is identical to the one described previously in Chapter 5.Tracking stops after the nudge is complete and sufficient time has been given to allow thenudged object to cease all motion. If both trackers converge, the final tracking estimatesfrom the left and right trackers are triangulated to form a 3D symmetry axis. If the sym-metry axis is roughly perpendicular to the table plane, object segmentation is performed.If either tracker fails or the resulting symmetry axis is no longer perpendicular to thetable, object segmentation is not performed. This prevents poor segmentations due tounwanted object motion such as the target object being tipped over.

133


6.4 Object Segmentation

Segmentation is performed using the object motion generated by the robotic nudge. Fig-ure 6.7 illustrates the major steps of the proposed object segmentation method. Fig-ure 6.7(a) and Figure 6.7(b) are images taken with the right camera before and after therobotic nudge. The absolute frame difference between the before and after images is shownin Figure 6.7(c). The object’s symmetry lines are overlayed on top of the frame differenceimage. The symmetry line of the object before the nudge is drawn in green and the lineafter the nudge is coloured red.

Thresholding the raw frame difference will produce a binary mask that includes manybackground pixels. The mask will also have a large gap at the center due to the low framedifference in the interior of the nudged object. By using the object’s symmetry line, theseproblems can be overcome. Figure 6.7(d) shows the compressed frame difference. Thisimage is produced by removing the pixels between the two symmetry lines in the framedifference image. The region of pixels on the left of the red symmetry line is rotated sothat the symmetry line is vertical. Similarly, the region on the right of the green symmetryline are rotated until the line is vertical. The compression is performed by merging theleft and right regions so that the red and green symmetry lines lie on top of each other.

After compression, a small gap still remains in the frame difference image. This canbe seen in Figure 6.7(d) as a dark vertical V-shaped wedge inside the cup-like shape.To remedy this, the robot exploits object symmetry to its advantage. Recall that thecompression step merges the symmetry lines of the object in the before and after frames.Using this merged symmetry line as a mirror, the robot searches for motion on either sideof it. If symmetric motion is found on the pixel pair on either side of the symmetry line,the pixels between the pair are filled in. This is similar to the motion mask refinementapproach previously described in Section 5.2.3, except single pixels are used as opposedto 8× 8 blocks. The resulting symmetry-refined image is shown in Figure 6.7(e). Finally,a segmentation binary mask is produced by thresholding the symmetry-refined differenceimage. The segmentation result in Figure 6.7(f) is produced by a simple pixel-wise logicalAND between the binary mask and the after nudge image.

134

Section 6.4. Object Segmentation

(a) Before nudge (b) After nudge

(c) Frame difference (d) Compressed difference

(e) Symmetry refinement (f) Segmentation result

Figure 6.7: Motion segmentation using symmetry. Note that the compressed difference andsymmetry refinement images are rotated so that the object’s symmetry line is vertical.

135


6.5 Autonomous Segmentation Results

Segmentation experiments were carried out on ten objects of different size, shape, textureand colour. Transparent, multi-coloured and near-symmetric objects are included in thetest object set. Both plain and cluttered scenes were used in the experiments, with someobjects set against two different backgrounds. While the experiments are carried outindoors, four bright fluorescent ceiling light sources provide uneven illumination in thescenes. In total, twelve experiments were carried out. The robot was able to autonomouslysegment the test object in all twelve experiments without any human aid.

Segmentation results are available from the nudge folder of the accompanying multimediaDVD. The segmentation results are tabulated in index.html. Each row of the tablecontains an image of the segmentation result alongside corresponding videos of stereosymmetry tracking and the robotic nudge as filmed from an external video camera. Notethat the full resolution image is displayed by clicking on an image in the table. For safetyreasons, a warning beacon is active when the robot manipulator is powered, periodicallycasting red light on the table. The beacon flash can be observed sporadically in thetracking videos.

The segmentation results and videos are also available online:• www.ecse.monash.edu.au/centres/irrc/li_iro08.php

6.5.1 Cups Without Handles

The test object set includes several types of common household objects. The first subsetcontains cups without handles.The segmentation is accurate and includes the entire cup.Figure 6.8(a) places a blue cup against background clutter. Again, the segmentation isaccurate. This pair of segmentations illustrate the robustness of the autonomous approachwhen operating against different backgrounds.

The white cup in Figure 6.8(b) poses a different challenge to the segmentation process.Apart from its imperfect symmetry, the narrow stem of the cup results in a very smallshift in object location after the nudge. This creates a narrow motion contour in the framedifference. The resulting segmentation shows that the proposed method only requiresminute object actuation to function. Lastly, Figure 6.8(c) show that the robot is able toautonomously obtain a very clean segmentation of a transparent cup against backgroundclutter. This highlights the flexibility and robustness of the symmetry-based approach tomotion segmentation, which is able to operate across objects of varying visual appearances.

136


Section 6.5. Autonomous Segmentation Results

(a) Blue cup in clutter

(b) Near-symmetric white cup

(c) Transparent cup in clutter

Figure 6.8: Autonomous segmentation results – Cups.

137


6.5.2 Mugs With Handles

Unlike the dynamic programming segmentation approach described in Section 4.2, the newmotion-based segmentation approach is able to include asymmetric portions of objectsin the segmentation results. This capability is tested by using mugs with handles astest objects. The white mug in Figure 6.9(a) contains an asymmetric handle, which issuccessfully included in the segmentation result.

The multi-colour ceramic mug in Figure 6.9(b) is a brittle near-symmetric test object. Therobotic nudge is able to gently actuate this fragile object while generating enough motionto initiate stereo tracking. The segmentation results include the mug but also a bit of thebackground. The reason for this can be seen in the stereo tracking videos. During thenudge, the L-shaped foam protrusion made contact with the table cloth. This was dueto a mechanical error in the PUMA 260 wrist joint that caused the end effector to diptowards the table.

(a) White mug

(b) Multi-colour mug

Figure 6.9: Autonomous segmentation results – Mugs with handles.

138

Section 6.6. Discussion and Chapter Summary

6.5.3 Beverage Bottles

The remaining test objects are various bottles designed to store liquids. Due to theirelongated shape, they generally have high centers of gravity, which means they are easy totip over. This is especially true for the textured and transparent bottles as they are empty.These empty bottles also tend to wobble when actuated, which provides a challenge forthe stereo symmetry tracking. Figures 6.10(a) and 6.10(b) show the segmentation resultsfor two textured bottles against a plain background. Figure 6.10(c) show the segmentationresult of a textured bottle against background clutter. These results suggest that the robotis able to nudge and segment drink bottles robustly and accurately.

Next, a small water-filled bottle is used to test the strength and accuracy of the roboticnudge. Due to its size and weight, the nudge must be accurate and firm to producesufficient object motion to initiate tracking. The successful segmentation in Figure 6.11(a)shows that the robotic nudge is capable of actuating small and dense objects.

Finally, the proposed segmentation approach is tested on an empty transparent bottle.Apart from the inherent visual difficulties posed by transparency, the top-heavy and lightbottle also tests the robustness of the robotic nudge. Figures 6.11(b) and 6.11(c) showtwo segmentation results for the empty transparent bottle.

6.6 Discussion and Chapter Summary

This chapter has shown that object motion generated by robotic manipulation can beused to produce accurate and physically true segmentations. The robot autonomouslyand robustly carried out the entire segmentation process, including the robotic nudgeused to actuate objects. By using the robotic nudge, the robot was able to segmenta brittle multi-colour mug with an asymmetric handle as well as transparent objects.Experiments on household objects have shown that the robotic nudge is a robust andgentle method of actuating objects for the purpose of motion segmentation. The accuratesegmentation results confirms the implicit assumption that object scale and orientation isnot significantly altered by the robotic nudge holds true in practice.

All twelve segmentation experiments were carried out successfully. However, it is possiblefor the robot to fail in obtaining an object segmentation. The most obvious issue is thelack of bilateral symmetry in the target object, which means the object will be invisibleto the robot’s symmetry-based vision system. As previously discussed, the fast symmetrydetector operates on edge pixels as input. As such, visually orthogonal approaches operat-ing on features such as colour and image gradients can be used synergetically to deal withasymmetric objects. However, the use of other vision methods will probably be predicatedon object opaqueness. For example, stereo optical flow and graph cuts segmentation mayfail due to the unreliable surface pixel information of transparent objects when they areunder actuation.

139


(a) Green textured bottle

(b) Brown textured bottle

(c) White textured bottle in clutter

Figure 6.10: Autonomous segmentation results – Beverage bottles.

140


(a) Small water-filled bottle

(b) Transparent bottle

(c) Transparent bottle in clutter

Figure 6.11: Autonomous segmentation results – Beverage bottles (continued).

141


In early experimental trials, the robot had a propensity to knock over test objects asits end effector did not have the L-shaped sponge protrusion. This led to the use ofstereo tracking to ensure stable object motion before attempting object segmentation.The stereo trackers are able to prevent object segmentation if unstable object motion isinduced by the robotic nudge. Trials where no object is present at the nudged location werealso conducted during the prototyping stage. The lack of object motion prevented anyobject segmentation attempt, which essentially allows the robot to detect clear portionsof the workspace using the robotic nudge. Also, the accidental actuation of asymmetricobjects will not lead to an object segmentation attempt. This is because of the useof stereo tracking during the nudge, which will receive no symmetry measurements asthe nudged object is not symmetric. The stereo tracker will diverge, resulting in thesegmentation process being restarted before object segmentation can take place. Overall,the many checks and balances in the proposed autonomous approach ensures that poorsegmentations rarely occur.

The proposed system does not explicitly address end effector path planning and obsta-cle avoidance. These problems are difficult to tackle without the use of 3D sensing orresorting to simple environments where the scene and test objects are previously mod-elled. As the symmetry-based sensing methods are unable to localize asymmetric objectsin three dimensions, other stereo methods are required to construct reliable occupancymaps. Given the robot’s knowledge of the table plane, a dense stereo approach shouldbe able to extract enough surface information to allow for end effector path planning andobstacle avoidance. The problem of multiple symmetric objects within the robot’s workspace is left to future work, although it seems straightforward to use the robotic nudgeto clear up possible triangulation ambiguities that occur. As the robotic nudge is verycompact in terms of its workspace usage, it is well suited for use in cluttered scenes wherepath planning must consider multiple obstacles and symmetric objects.

Now that the robot can autonomously obtain segmentations of new objects, many sensingand manipulation possibilities open up for exploration. For example, the segmentationsobtained can be used as the positive images of a data set for the purposes of training upa boosted Haar cascade [Viola and Jones, 2001]. This will allow the robot to recognizepreviously segmented objects without requiring manual collection and labelling of trainingimages. The next chapter shows that segmentations obtained using the robotic nudgeenable the autonomous grasping and modelling of new objects.

142

Intelligence is the ability to adapt to change.

Steven Hawking 7Object Learning by Interaction

7.1 Introduction

In the previous chapter, the robot was imparted with the ability to detect and segmentnear-symmetric objects autonomously. The robot used a precise nudge to extract objectinformation, removing the traditional reliance on prior knowledge such as shape and colourmodels. The previous chapter also suggested that autonomously obtained segmentationscan be used as training data for an object recognition system. The work presented inthis chapter confirms this suggestion experimentally. The robot learns and recognizesobjects autonomously by integrating the work presented in previous chapters with newobject manipulation and visual sensing methods. Training data is collected by physicallygrasping an object and rotating it in view of the robot’s cameras. Experiments showthat the robot is able to autonomously learn object models and then use these models toperform robust object recognition.

7.1.1 Motivations

Autonomous learning is an interesting concept as humans use it regularly to adapt to newenvironments. A robot with the ability to learn new objects without human guidance willbe more flexible and robust across different operating environments. Instead of relyingon manually collected training data or object models provided by humans, the robot canlearn about objects all by itself. This shifts the burden of training data collection andmodel construction from human users to the robot. By doing so, the robot is now able tooperate in environments such as the household where the large number of unique objectsmake exhaustive modelling and training intractable.

The autonomous use of robotic actions to nudge, grasp and rotate objects in order toproduce reusable models is novel. The experimental platform allows us the unique oppor-tunity to examine the robustness and accuracy of an object recognition system trainedusing images obtained autonomously by a robot. David Lowe’s scale invariant featuretransform (SIFT) [Lowe, 2004] is used to model and recognize objects. As such, these

143

CHAPTER 7. OBJECT LEARNING BY INTERACTION

experiments also provide an insight into the performance of SIFT for modelling graspedobjects in a robotics context.

Overall, the primary goal remains the development of model-free approaches to roboticsand computer vision. In essence, the wish is to provide the robot with as little priorknowledge as possible while still having it perform useful tasks such as object learning andrecognition. The autonomous approach presented in this chapter will hopefully encouragemore future work that focuses on the use of robotic action to perform recognition sys-tem training, as opposed to the currently accepted norm of offline training on manuallyobtained data.

7.1.2 Contributions

The work presented here makes several contributions to research in the area of objectmanipulation, object learning and object recognition. On the global level, the completelyautonomous nature of the proposed approach is novel. The autonomous use of objectmanipulation to generate training data for object learning has not been attempted in thepast, only suggested as a future possibility. Experiments show that a robot can obtaintraining data by itself and use the robot-collected data to learn reusable visual modelsthat allow robust object recognition. Research contributions are made in the followingareas.

Performing Advanced Object Manipulations Autonomously

Existing approaches to object manipulation generally require object models or the fittingof geometric primitives to 3D data prior to robotic action. In his paper on object seg-mentation using a robotic poking action [Fitzpatrick, 2003a], Fitzpatrick proposed thatit maybe possible to perform what he termed advanced manipulations by using objectinformation obtained via a simple manipulation. Probably due to the one-fingered endeffector of his robot arm, Fitzpatrick did not address this suggestion any further in hisresearch.

The work presented in this chapter confirms Fitzpatrick’s suggestion. The robotic nudgeis successfully used to obtain the necessary object information to perform a graspingoperation. The autonomously obtained object segmentation is used in stereo to find theheight of the nudged object. By using the relatively simple robotic nudge, the robot is ableto grasp near-symmetric objects without requiring any prior geometric knowledge of thetarget object such as width and location. Upon a successful grasp, the robot rotates theobject about its longitudinal axis to obtain views of the object at different orientations.After rotation, the object is replaced at its pre-nudge location. The robot’s ability toleverage a simple object manipulation to perform more advanced manipulations is noveland useful in situations where the robot has to deal with new objects.

144


Autonomous Training Data Collection for Object Learning

The previous chapter suggested that autonomously collected object segmentations can beused as training data for an object recognition system. While the autonomously producedsegmentations are accurate, they only present a single view of the nudged object. Bygrasping the nudged object, the robot now has access to multiple views of the object.Once an object has been grasped, the robot rotates the object over 360 degrees. Imagesare captured at 30-degree intervals, resulting in twelve training images for each object.This allows the construction of detailed object models that include the object’s surfaceinformation across multiple orientations.

Traditionally, object modelling is done offline using a turntable-camera or scanning rangefindersystem. The concept of autonomous training data collection using a robot manipulatorfor the purpose of object learning is relatively new. The quality of the robot-collectedtraining data is evaluated experimentally. Images of grasped objects are used to train anobject recognition system. The performance of the object recognition system provides anindication of the usefulness of the autonomously collected training data. Object recog-nition experiments also indirectly measure the effectiveness of the proposed autonomousapproach to object learning.

Robust Object Recognition using SIFT Descriptors

SIFT descriptors [Lowe, 2004] are used to perform object recognition. After collectingtraining images of the grasped object from different orientations, SIFT descriptors aredetected for each image. SIFT descriptor sets from different views of the same object arecollected together to form an object model. Object recognition is performed by detectingdescriptors in the input image and matching them against a database of object descriptors.The descriptor modelling process is further detailed in Section 7.3.

Traditionally, object recognition is performed using SIFT descriptors extracted from im-ages where the target object has been segmented manually. The proposed approach de-parts from the norm as the training data is obtained autonomously by the robot. Whilethe robotic nudge produces object segmentations that have similar accuracy to manualsegmentations, object grasping is inherently error prone. Due to the lack of object mod-els, the robotic grasp is simply designed to firmly hold the object for rotation. The largeconcave foam pads on the robot’s gripper, as seen in Figure 7.2(a), ensures a firm graspbut may displace the object or change the object’s orientation. Once an object is grasped,the robot is no longer able to maintain an accurate estimate of the object’s pose.

The imprecise grasping operation is a problem as the robot no longer has the ability toaccurately distinguish between object and background features. Without a way to pruneaway background features, robust object recognition is impossible. This is especiallycrucial when using SIFT descriptors, as the inclusion of background descriptors may lead torepeated false positives. To overcome this problem, a pruning method has been developed

145


to remove background SIFT descriptors by cross-examining the descriptors detected fromall image views of the object. The proposed pruning method does not require bilateralsymmetry to operate, which prevents background symmetries from disrupting the pruningprocess. As such, the pruning method can be used to remove background SIFT descriptorsin any situation where an object is being rotated in front of a static scene. The pruningmethod is fully detailed in Section 7.3.3.

7.2 Autonomous Object Grasping After a Robotic Nudge

The object learning process begins with a robotic nudge as described previously in Section 6.3.After a successful robotic nudge, the motion segmentation method detailed in Section 6.4is performed twice to produce an object segmentation for each camera image. The threedimensional information needed to perform object grasping is obtained by triangulatingthe top of the object segmentations. Grasping is performed using a two-fingered robotgripper mounted at the end of a PUMA 260 manipulator. The robot manipulator and grip-per are controlled using the custom-built controller detailed in Appendix B. This sectiondetails the method the robot uses to grasp objects after the robotic nudge.

7.2.1 Robot Gripper

Object grasping requires an end effector with two or more fingers. A gripper with morefingers generally provide a more stable grasp, as more points of contact are made with anobject. However, more fingers also require more complex kinematics planning and the needfor self collision avoidance. As the robot does not have detailed three dimensional modelsof target objects, little benefit is gained from having more fingers on the gripper. Instead,the gripper and grasping approach are chosen to ensure a stable grasp. Concave foampads are added to a two-fingered gripper to increase the area of gripper-object contact toensure grasp stability.

Figure 7.1 shows the gripper and angle bracket before the addition of foam pads. Thegripper is an Otto Bock prosthetic hand, which has a wide maximum grasp width. Thegripper has two fingers, with the left finger having two grey tips that make contact whengrasping an object. As the gripper is a human prothesis, the wrist’s longitudinal axis isat an angle to the gripper’s opening. A custom-made angle bracket is used to correct thismisalignment, so that the gripper opening is in line with the wrist axis about which thegripper rotates. The angle bracket can be seen at the top left of Figure 7.1.

146

Section 7.2. Autonomous Object Grasping After a Robotic Nudge

Figure 7.1: Robot gripper and angle bracket.

Figure 7.2 shows the modifications that have been made to the gripper. Blue foam padshave been added to the gripper to increase the contact area between the gripper fingersand a grasped object. The concave shape of the pads helps guide the grasped objecttowards the center of the gripper, which increases grasp stability. The blue foam is rigid,unlike the soft foam that is used to construct the L-shaped nudging protrusion. A smallpiece of metal was added to the bottom finger to compensate for the fact that the upperfinger has two tips. Figure 7.2(b) shows the gripper at maximum opening width.

(a) Front view of gripper (b) Side view of gripper

Figure 7.2: Photos of robot gripper.

147


7.2.2 Determining the Height of a Nudged Object

Recall from the previous chapter that after a successful robotic nudge, stereo trackingprovides the three dimensional location of the nudged object’s symmetry axis. The locationof the object on the table can be found by calculating the intersection between its symmetryaxis and the table plane. However, grasping also requires the height of the target object.This prevents the gripper from grasping air or hitting the top of the object during itsdescent. In addition, the grasp can be planned so that the gripper only occludes a smallportion of the object, which allow more visual features to be identified during modelbuilding.

The motion segmentation method detailed previously in Section 6.4 is used to obtain twosegmentations, one for each camera. In each segmentation, the top of the object in thecamera image is detected using the following method. The largest 8-connected blob inthe binary segmentation mask is found, while all other blobs are removed. This preventsbackground motion along the object’s symmetry line from affecting the detected top ofobject location. The CvBlobs library [Borras, 2006] is used to perform the binary blobanalysis. After identifying the object binary blob, the intersection between the post-nudgesymmetry line estimate of the tracker and the top of the object binary blob is found. Thisintersection point is returned as the top of the object in each camera image. Figure 7.3shows the top of a nudged object detected using this symmetry intersection approach.

(a) Left camera image (b) Right camera image

Figure 7.3: Detecting the top of a nudged object. The top of the object’s binary blob is shownas a black dot. The post-nudge symmetry estimate of the tracker within the object’s binary blob isshown in red.

Now that the top of the object has been visually located in each camera image, stereotriangulation can be performed to determine the height of the object. However, there isan inherent uncertainty when using stereo triangulation to obtain an object’s height. Thisis because of the fact that the top of the object in the camera image physically belongsto the rear of the object. This results in a triangulated height that is always greater thanthe actual height of the object.

148

Section 7.2. Autonomous Object Grasping After a Robotic Nudge

Figure 7.4 represents the pertinent geometry of a single camera from the stereo pair, thetable plane and an object that is having its height triangulated. Notice that the blue linejoining the camera’s focal point and the top of the object in the camera view actuallyintersects the rear of the object. Performing stereo triangulation on the blue ray usingverged stereo cameras will result in the location marked as a black dot in the figure. Thislocation is higher than the object’s height for any graspable object.

し

d

r

し

Object Symmetry Axis

Object

Camera

Table

Figure 7.4: Uncertainty of stereo triangulated object height.

The object height error is labelled as d in the figure. The object radius is labelled as r. Incases where the object deviates from a surface of revolution, r represents the horizontaldistance between the object symmetry axis and the point on the top of the object thatis furthest from the camera. The angle between the camera’s viewing direction and thetable plane is labelled as θ. Using similar triangles, the height error d is described by thefollowing equation.

d = r tan θ (7.1)

For the experiment platform, θ is roughly thirty degrees. Humanoid robots dealing withobjects on a table at arm’s length will have a similar θ angle. At thirty degrees, d isroughly 0.60 × r. As the radii of the top of the test objects range from 30mm to 90mm,d will be between 18mm and 54mm. Therefore, the gripper grasping coordinates arevertically offset by 36mm, which allows the robot to grasp all the test objects as long asthe gripper maintains a vertical tolerance of ±18mm. The large foam pads attached tothe gripper’s fingers well exceeds the vertical tolerance. In a situation where the objectwidths are unknown, the gripper’s maximum opening width can be used in place of themaximum object width to determine the vertical offset.

149


7.2.3 Object Grasping, Rotation and Training Data Collection

After determining the object height, grasping is performed by moving the gripper directlyabove the intersection between the object’s symmetry axis and the table plane. Theopened gripper is lowered vertically until the gripper fingers reaches the object height.The gripper is closed and then raised vertically to lift the grasped object above the table.The gripper is raised such that the majority of the gripper is no longer visible in the rightcamera image. This prevents the inclusion of gripper descriptors in the object model.

The grasped object is rotated about the vertical axis of the robot manipulator wrist.Images of the object are taken at 30-degree intervals using the right camera in the stereopair, resulting in a total of 12 images per grasped object. The 30-degree angle incrementis chosen due to the use of SIFT descriptors for object modelling, which are reported by[Lowe, 2004] to tolerate viewing orientation changes of ±15 degrees. An example trainingdata set is displayed in Figure 7.5.

Figure 7.5: Autonomously collected training data set – Green bottle.

150

Section 7.3. Modelling Object using SIFT Descriptors

Notice that in Figure 7.5, the green bottle’s symmetry line changes its orientation acrossdifferent training images. This is due to a slightly off-center grasp of the object, whichproduces a tilt in the bottle’s longitudinal axis. The off-center grasp is due to symmetryaxis triangulation error as well as mechanical offsets between the gripper and robot manip-ulator wrist. After rotating the grasped object to obtain a set of twelve training images,the object is replaced at its original location on the table before the robotic nudge. Thisallows for future revisits if further training data is needed.

Two videos explaining the autonomous grasping of nudged objects are available in thelearning/explain folder of the multimedia DVD. Both videos include audio commentaryby the thesis author. In combination, these two explanation videos demonstrate that therobot can autonomously grasp symmetric objects of different heights. The blue cup videocontains a demonstration where a blue cup is nudged and then grasped autonomously bythe robot. The video also shows the graphic user interface of the robotic system, whichprovides real time visualization of the planned nudge, object segmentation and otherpertinent information. The white bottle video shows the robot nudging, grasping androtating a white bottle. The saving of experiment data such as tracking images has beenminimized in this video, which allows the robot to perform the learning process at fullspeed. The object segmentation obtained using the robotic nudge is also shown in realtime during this video.

7.3 Modelling Object using SIFT Descriptors

7.3.1 Introduction

The scale invariant feature transform (SIFT) [Lowe, 2004] is a multi-scale method thatextracts descriptors from an image. SIFT descriptors are 128-value vectors that encode theimage gradient intensity and orientation information within a Gaussian-weighted window.SIFT descriptors are highly unique, which makes them ideal for object recognition as theprobability of confusing the descriptors of different objects is very low. The SIFT processis invariant against translation, rotation and scaling. SIFT descriptors are also invariantto illumination changes. The combination of affine and illumination invariance allows forvery robust object recognition on real world images.

SIFT is a two step process. The first step detects stable regions in the input image.A comprehensive survey of region detectors is provided by [Mikolajczyk et al., 2005].Lowe’s recommended region detector finds stable locations in scale space, which essentiallyrepresent high contrast blob features within the input image. Interest point detectors suchas the Harris corner [Harris and Stephens, 1988] and maximally stable extremal regions(MSER) [Matas et al., 2002] can also be used to produce stable regions.

After finding a list of stable regions in the image, a descriptor is extracted from each region.A survey of descriptors is provided in [Mikolajczyk and Schmid, 2005]. Descriptors are

151


designed to encode local pixel information into an easily matchable vector. Lowe’s SIFTdescriptor encodes local gradient intensity and orientation information into a 128-valuevector. The descriptor building process removes illumination information and limits theimpact of image noise. The Euclidean distance between descriptor vectors is used tomeasure their similarity, with shorter distances representing better matches.

7.3.2 SIFT Detection

Grasped objects are modelled by performing SIFT detection on all twelve images of theobject’s training data set. The 640 × 480 pixel colour training images are converted tograyscale prior to SIFT detection. David Lowe’s SIFT binary is used to perform thedetection. The binary outputs a plain text file that contains the descriptors returned fromdetection. The author’s own C++ code is used to match and visualize descriptors.

The SIFT Descriptors detected in Figure 7.6(a) are visualized in Figure 7.6(b). Eachdescriptor is drawn as a blue circle. A circle’s radius is proportional to the scale of thedescriptor. The location of a descriptor is shown as a blue dot at the center of the circle.Note that the circles are merely a visualization and are not the same size as the Gaussianwindows used during descriptor building. The SIFT detection produced 268 descriptors,providing dense coverage of the bottle’s pixels. However, numerous background descriptorshave also been detected. This is especially noticeable in the top right of the image. Toovercome this problem, an automatic pruning step is performed to remove descriptors thatdo not belong to the grasped object.

7.3.3 Removing Background SIFT Descriptors

The problem of background SIFT descriptors is illustrated by Figure 7.7(a). The bluedots in the figure show the location of descriptors returned by SIFT detection. Notice thatapart from the grasped bottle, many background descriptors are detected. The inclusion ofthese descriptors in the object model will result in false positives during object recognition,especially when the robot is operating on objects set against the same background.

Before using the detected SIFT descriptors for object recognition, background descriptorsare removed using an automated pruning process. Figure 7.7(b) shows the remaining SIFTdescriptors after pruning away background descriptors. Notice that the dense distributionof descriptors in the upper right and lower right of Figure 7.7(a) are no longer present inthe refined result. The descriptor belonging to the object’s shadow has also been removed.Note however that several descriptors belonging to the L-shaped foam protrusion remainin the refined result.

152


(a) Input image

(b) SIFT descriptors

Figure 7.6: SIFT detection example – White bottle training image.

153


(a) All SIFT descriptors

(b) Background descriptors removed

Figure 7.7: Removing background SIFT descriptors.

154


The automatic removal of background descriptors is performed as follows. Firstly, de-scriptors that are very far away from the grasped object is removed. This is achieved byplacing a bounding box around the grasped object and searching for descriptors outsidethis bounding box. The bounding box is large enough to allow for the object tilt and dis-placement caused by imperfect grasping. This first step removes the majority of non-objectdescriptors, such as the dense cluster of descriptors on the right side of Figure 7.7(a).

Secondly, the robot takes advantage of the fact that twelve training images are collected foreach object. As the grasped object is rotated in front of a static background, backgrounddescriptors should occur much more frequently than object descriptors in the trainingimage set. Generally, a SIFT descriptor remains detectable for one forward and onebackward object rotation increment. Therefore, an object descriptor should only matchwith a maximum of two other descriptors from the training image set, one from the imagerecorded at the previous rotation step and one from the image recorded at the next rotationstep.

Programmatically, this constraint is applied by searching for descriptor matches betweenthe training images. A descriptor with two or fewer matches with descriptors of othertraining images are identified as object features. The remaining descriptors that havethree or more matches are considered to be background features and rejected before objectmodelling. The ratio-based method described in Section 7.3.4 is used to find descriptormatches.

The proposed pruning method for removing background SIFT descriptors can be appliedto any situation where an object is rotated at fixed angular increments in front of a staticbackground. The method does not require the rotated object to have any detectable linesof symmetry. The pruning method does not use bilateral symmetry for SIFT descriptorrejection as the robot’s gripper may disrupt or shift the grasped object’s symmetry axis.Also, the human prothesis used as the robot’s gripper tend to tilt the grasped object dueto the uneven closing speeds of its fingers. As such, the object’s symmetry line will requiretracking during the rotation if it is used for descriptor rejection.

The number of descriptor matches allowed within a training image set should be ad-justed depending on the angular size of the rotation increment. Note that the removalof background descriptors reduces the total number of descriptors in the object recogni-tion database. For example, the removal of background descriptors shown in Figure 7.7reduced the number of descriptors from 268 to 163. Reducing the quantity of descriptorsdirectly reduces the computational cost of object recognition and help improve recognitionrobustness by lowering the probability of false positives during descriptor matching.

7.3.4 Object Recognition

Figure 7.8 provides an overview of the robot’s object recognition process. The robot per-forms object recognition by matching the descriptors detected in an input image with

155


descriptors in an object database. SIFT detection is performed on each of the twelverobot-collected training images, resulting in twelve descriptor sets for each object. Back-ground descriptors are removed from an object’s descriptor sets using the method detailedpreviously in Section 7.3.3 prior to their insertion into the object recognition database.

0o 30o 60o 330o

OBJECT LABEL: White Bottle

SIFT Descriptor Sets from 12 Training Images

OBJECT LABEL: Yellow Bottle

OBJECT LABEL: Transparent Bottle

OBJECT DATABASE

Input Image

Detect

SIFT

Compute Matches with all Descriptor Sets

0o

30o

60o

330o


0o 30o 60o 330o


Object Recognition

Results

Object Label of

Best SIFT Set

Best SIFT

Descriptor Set

Training Image

of Best SIFT Set

Set with Most Matches is BEST

InputSIFT

Descriptors

Figure 7.8: Object recognition using learned SIFT descriptors.

SIFT descriptor matching is performed using the ratio-based method suggested by Lowe.Descriptor matching occurs between the descriptors of the input image against all descrip-tor sets in the object database, one set at a time. Matches are computed between the inputdescriptors and each descriptor set in the database as follows. The Euclidean distancesbetween each input descriptor and the descriptor set in the recognition database are cal-culated. The descriptor in the database set closest to the input descriptor according to itsEuclidean distance is recorded as a possible match. To ensure matching uniqueness, thesecond-closest descriptor is also found. The closest descriptor is returned as a match if itis much closer than the second-closest descriptor. This criteria is evaluated using equationd1 < (N × d2), where d1 and d2 are the Euclidean distances from the input descriptor tothe closest and second-closest descriptors. N is set to 0.6 in the C++ implementation.

Object recognition is performed by exhaustively matching the input SIFT descriptorsagainst the descriptor sets in the object database. The set with the most number ofdescriptor matches with the input descriptors is considered best. The object label of thebest descriptor set is returned as the recognized object. The recognition system alsoreturns the training image that produces the best descriptor set for visual verificationpurposes. According to [Lowe, 2004], a minimum of three correct matches is needed for

156

Section 7.4. Autonomous Object Learning Experiments

object recognition and localization. Therefore, object recognition will only return a resultif three or more descriptor matches are found between the input image and the bestmatching descriptor set in the database.

7.4 Autonomous Object Learning Experiments

The proposed autonomous object learning approach is evaluated using seven test objects.The test objects are shown in Figure 7.9. The test objects are beverage bottles, includingtwo transparent bottles and a reflective glass bottle. Each object was grasped and rotatedby the robot to collect training images for object recognition. The entire learning processis performed autonomously by the robot. The only human intervention required is theplacing of individual test objects within the robot’s reachable workspace at the beginningof each experiment. SIFT detection and background descriptor removal are performedautomatically after the conclusion of each learning experiment.

The robot performed autonomous learning on all seven test objects. Videos of the roboticnudge, object grasping and object rotation are available from the multimedia DVD in thelearning/grasp folder. The videos are named after the object captions in Figure 7.9.There is a visible mechanical issue in the grasping videos. The object’s symmetry line inthe camera image is generally no longer vertical after the grasp. This is due to the useof a human prothesis hand as the robot’s gripper, which has an angled wrist. Also, thespeed at which the fingers of the gripper closes is uneven. The custom-built bracket shownin Figure 7.1 only partially corrects this issue. The autonomous learning process is notaffected by this mechanical issue.

The long pause after the robotic nudge in the object grasping videos is due to the saving ofimage data to document the experiment. This includes the writing of 200 640×480 track-ing images to the host PC’s hard drive, which takes a considerable amount of time. Thedata collected by the robot during the learning experiments, including the training images,are available from the learning/data folder. The readme.txt text file in the folder pro-vides further details about the experiment data. Without the saving of experiment data,the grasping can be planned and performed 160ms after the robotic nudge. A video walk-through of the autonomous learning process, where the saving of experiment data has beenreduced, is available form the multimedia DVD as learning/explain/white bottle.avi.The video walkthrough includes a verbal narration of the entire learning process by thethesis author.

7.4.1 Object Recognition Results

After the completion of autonomous learning on all seven test objects shown in Figure 7.9,object recognition experiments are performed using the learned object database. Therecognition system is tested using 28 input images, four for each of the seven test objects.

157


(a) White (b) Yellow (c) Green (d) Brown

(e) Glass (f) Cola (g) Transparent

Figure 7.9: Bottles used in object learning and recognition experiments.

Each quartet of input images contains the test object placed against different backgrounds,ranging from plain to cluttered. The orientation of the test object is varied between thefour input images. Some input images also include large partial occlusions of the testobject.

The object recognition results are available from the learning/sift folder of the attachedmultimedia DVD. The results are organized into folders according to the test object in theinput image. The input images are named inputXX.png where XX is a number from00 to 03. In general, a higher image number implies greater recognition difficulty. Thetraining image that produced the best matching SIFT descriptor set is named databa-

seXX.png, where XX corresponds to the input image number.

158

Section 7.4. Autonomous Object Learning Experiments

The object recognition results are visualized in the images named matchXX.png. Again,the XX at the end of the image file name is the same as the corresponding input imagenumber. These images are a vertical concatenation of the input image and the bestmatching training image. The input image is shown above the matching training image.SIFT descriptor matches are linked using red lines in the image. The label of the objectidentified in the input image by the robot’s object recognition system is shown as greentext at the bottom of the image.

Table 7.1 contains the number of good and bad matches obtained during the object recog-nition experiments. A good match is defined as one where the descriptors are at similarlocations on the object in the input image and the matching training image. Similarityin location is determined visually via manual observation. Matches that do not belong tothe same part of the object are labelled as bad. For example, Figure 7.14 contains a baddescriptor match between the feature on the bottle cap in the input image and the glossylabel in the training image.

Table 7.1: Object recognition results – SIFT descriptor matches

BottleImage number

00 01 02 03Good Bad Good Bad Good Bad Good Bad

White 16 0 6 0 17 0 7 0Yellow 14 0 11 0 24 0 4 0Green 23 1 21 1 11 0 9 1Brown 15 0 16 0 16 0 8 0Glass 5 0 6 1 4 1 4 1Cola 7 0 4 0 9 0 11 0

Transparent 6 0 7 1 11 0 6 0

As only three correct SIFT matches are needed for object recognition and pose estima-tion, the results in Table 7.1 suggests that SIFT descriptors extracted from autonomouslycollected training data is sufficient for robust object recognition. The correct object labelsreturned by the recognition system for all 28 input images further confirm this sugges-tion. The remainder of this section discusses some of the object recognition results in thelearning/sift folder of the multimedia DVD.

Textured Bottles

Figures 7.10 to 7.13 contain several object recognition results for the textured bottle testobjects. Figure 7.10 shows the affine invariant nature of SIFT descriptors. The whitebottle is successfully recognized in the input image despite its inversion of orientation. InFigure 7.11, the input image shows the yellow bottle in the middle of several other objects.Notice that a textured yellow box and another textured yellow bottle are present in thescene. The object recognition system is able to identify the yellow bottle correctly, withan abundance of descriptor matches.

159


Figure 7.12 contains a difficult object recognition scenario. The green bottle is heavilyoccluded by other objects in the scene. Due to its shiny surface, bright white specularreflections are also present on the bottle. Despite these challenges, the object recognitionsystem is able to find numerous descriptor matches and thereby correctly identify theobject in the input image as the green bottle. Figure 7.13 contains a similar scenario,where the recognition system is able to identify the partially occluded brown bottle.

Glass and Transparent Bottles

Figures 7.14 to 7.16 contain several recognition results for the glass bottle and the twopartially transparent bottles. These objects are difficult to recognize due to their unre-liable surface information. The glass bottle and its shiny label produces many specularreflections. The glass bottle was also chosen to show that the robot can autonomouslymanipulate and model fragile objects with high centers of gravity. The transparent bottlesare prone to specular reflections and also change their appearance depending on the scenebackground.

Figure 7.14 shows an object recognition result for the glass bottle partially occluded byclutter. The object recognition system successfully identifies the glass bottle and returnsfive SIFT descriptor matches. However, the descriptor match between the bottle cap inthe input image and the bottle label in the training image is incorrect. This matchingerror is probably due to a specular reflection on the bottle cap, which appears similar tothe specular reflection on the shiny label. Note that as noise robust methods such as theHough transform are generally used to recover object pose from SIFT descriptor matches,single matching errors will not adversely affect object localization.

Figure 7.15 shows the half-transparent cola bottle being successfully recognized amongstbackground clutter. The bottle’s liquid content has been purposefully removed to producea large change in its appearance. Despite emptying the cola bottle, many correct SIFTdescriptor matches are found for the object. The recognition system is able to identifythe correct object without being adversely affected by the change in object appearance.Finally, the result in Figure 7.16 shows that a mostly transparent bottle can be recognizedusing the proposed method. The texture on the bottle label appear to be highly distinctive,producing many correct matches.

7.5 Discussion and Chapter Summary

This chapter demonstrated that a robot can autonomously learn new objects through thecareful use of object manipulation. The robot was able to roughly determine the height ofa nudged object by performing segmentation in stereo. Most importantly, the autonomousgrasping of objects has clearly demonstrated that a relatively simple manipulation, the

160


Figure 7.10: Object recognition result – White bottle (match01.png).

161


Figure 7.11: Object recognition result – Yellow bottle (match02.png).

162


Figure 7.12: Object recognition result – Green bottle (match02.png).

163


Figure 7.13: Object recognition result – Brown bottle (match03.png).

164


Figure 7.14: Object recognition result – Glass bottle (match03.png).

165


Figure 7.15: Object recognition result – Cola bottle (match03.png).

166


Figure 7.16: Object recognition result – Transparent bottle (match02.png).

167


robotic nudge described in the previous chapter, can enable the use of more advancedmanipulations.

The robot autonomously collected training data by rotating a grasped object. Experimentsshow that the robot-collected training data is sufficient to produce reusable object models.By automatically pruning away background descriptors, the robot is able to generate visualmodels that describe multiple views of a grasped object. Object recognition experimentshave clearly demonstrated that the object models produced autonomously by the robotallow robust object recognition. Moreover, the robotic system is able to interact witha fragile glass bottle and partially transparent objects in order to build object modelsautonomously.

However, there are some issues that should be addressed by future work. Firstly, theproposed learning approach is not designed to deal with asymmetric objects. If a suitablereplacement method can be found to autonomously produce object segmentations, thegrasping and SIFT model building parts of the approach can be applied without significantchange. Secondly, small asymmetric parts of symmetric objects, such as cup handles, maycause grasping failure. This problem requires additional visual sensing designed to locatesmall object asymmetries. Thirdly, the problem of learning duplicate object models canbe addressed by performing object recognition prior to the robotic nudge. If the targetobject already exists in the robot’s database, the robot can simply move on to investigatethe next object.

The system in this chapter has integrated fast bilateral symmetry detection, symmetrytriangulation, real time object tracking, autonomous segmentation via the robotic nudge,autonomous object grasping and SIFT-based object modelling to produce an autonomouslearning system. The general nature of the proposed learning approach should allow theuse of other robotic manipulators to perform the necessary object manipulations. It mayalso be possible for a human teacher to actuate the object. Overall, the proposed approachtakes an important step towards greater robot autonomy by shifting the laborious task ofobject modelling from the human user to the tireless robot.

168

At bottom, robotics is about us. It is the discipline ofemulating our lives, of wondering how we work.

Rod Grupen 8Conclusion and Future Work

8.1 Summary

Biological optimization through evolution by natural selection can achieve stunningly el-egant and intelligent autonomous systems. For example, the tiny honey bee, possessinga brain of only a million neurons, is able to visually distinguish between multiple humanfaces [Dyer et al., 2005]. The human intelligence combines offline optimization in theform of evolution over millions of years with online learning through the interpretation ofsensory data into experiences during a person’s life. Our adaptive intelligence is difficultto mimic with a robotic system as it incorporates functions such as reliable motor control,robust visual sensing and real time decision making, all of which are intertwined and everchanging.

Philosophically, human mimicry is an interesting challenge for roboticists. Pragmatically, adomestic robot performing household tasks will benefit from having human-like sensing andsensibilities. While tactile sensors with the density of receptors, robustness and flexibilityof human skin are yet unavailable, the same is not true for visual sensors. A study ofhuman optical signals [Koch et al., 2006] indicates that the human eye produces around8.75 megabits of data per second. This is a fraction of the data rate of a colour webcamcapturing 640 × 480 pixel images at 25 frames per second. Ignoring the high dynamicrange of the human eye, it appears that current vision sensors are capable of providingthe quantity of data needed for human-like visual processing.

Given the emerging market for domestic robots caused by aging populations in the de-veloped world, as previously discussed at the beginning of Chapter 1, the stage seems setfor robotic systems that rely on vision to perform household tasks. A statistical survey ofpapers in the IROS 2008 conference [Merlet, 2008] provides evidence for this hypothesis.The survey shows that papers with the keywords computer vision and humanoid robot areranked in the top three in terms of total submissions and accepted papers. While popu-larity does not necessarily indicate quality or usefulness, it does indicate massive researchinterest in visual sensing and humanoid robots, with the latter generally equipped withmanipulators capable of object interaction.

169

CHAPTER 8. CONCLUSION AND FUTURE WORK

The need for adaptive and intelligent object interaction motivates the research presentedin this thesis. The presented research includes a novel bilateral symmetry detector, model-free object sensing methods, an autonomous object segmentation approach and a roboticsystem that performs object learning autonomously. Each piece of research has beenevaluated using real world images or robotic experiments on common household objects.Recall that the motivating challenges were previously detailed in Section 1.1. The researchpresented in this thesis addresses these challenges as follows.

Fast and Robust Detection of Bilateral Symmetry

Bilateral symmetry is a visual property that rarely occurs by chance. Symmetry generallyarises from symmetric objects or a symmetric constellation of object features, both of whichare useful from the standpoint of object interaction. The literature survey of symmetrydetection research provided in Chapter 2 revealed an unexplored niche for fast and robustdetectors, which are sorely needed for real time robotic applications. The novel bilateralsymmetry detection method proposed in Chapter 3 is designed to fill this niche. Overall,the proposed detection method fully addresses both the speed and robustness aspects ofthe motivating challenge.

The proposed fast bilateral symmetry detection method is the fastest detector at the time ofwriting. The C++ implementation of the algorithm running on a Intel 1.73GHz PentiumM laptop only requires 45ms to detect symmetry lines of all orientations in a 640×480 pixelimage. Timing trials conducted against the highly cited generalized symmetry transform[Reisfeld et al., 1995] shows that the proposed detection method is roughly 8000 timesfaster. Additionally, narrowing the angle limits in fast symmetry linearly reduces detectiontime. These angle limits allow the use of fast symmetry in demanding real time systems,such as the Kalman filter object tracker detailed in Chapter 5.

Some existing methods, such as the SIFT-based method described in [Loy and Eklundh,2006], are able to operate on camera images of real world scenes. However, these methodsrely on symmetry in local gradients and pixel values, which breakdown in the presence ofshadows and specular reflections due to unfavourable lighting. Additionally, pixel valuesare inherently unreliable for objects with transparent or reflective surfaces. Fast symme-try is able to robustly deal with a wide variety of objects under non-uniform illuminationby solely relying on edge pixel locations as input data. Additionally, the use of Cannyedge detection [Canny, 1986] in conjunction with convergent voting using Hough trans-form [Hough, 1962] provides a high level of robustness against input noise.

Development of Model-Free Object Sensing Methods

The speed and robustness of fast symmetry allows it to be applied to a variety of objectsensing problems. Object sensing methods can be formulated in a model-free manner byusing visual symmetry as an object feature. A robot with model-free sensing is able to

170

Section 8.1. Summary

function without relying on a priori object models. This allows the robot to operate inenvironments where new objects are present with minimal prior training. Moreover, thelarge number of symmetric objects in the average household makes bilateral symmetry auseful object feature. The proposed model-free object sensing methods are able to operatequickly and robustly on the kind of sensor data encountered by a domestic robot.

A visual sensing toolbox consisting of model-free object sensing methods has been de-veloped. The static sensing problems of object segmentation and stereo triangulation areaddressed in Chapter 4. The proposed segmentation approach uses dynamic programmingto extract near-symmetric edge contours. This symmetry-guided segmentation method isfast, requiring an average of 35ms on 640 × 480 test images. Experiments on real worldimages show that multiple objects can be segmented but the resulting contours may notencompass the entire object. This weakness of the segmentation method, caused by thelack of assumed object models, motivates the robotic nudge approach to segmentationdetailed in Chapter 6.

The proposed stereo triangulation method generates a three dimensional symmetry axisfrom a pair of symmetry lines, one from each camera in a stereo pair. Departing fromtraditional stereo methods, the proposed approach does not rely on matching local corre-spondences and does not return 3D information about surfaces. Instead, the triangulatedaxis of symmetry represents structural information, similar to a three dimensional medialaxis. Experiments show that symmetry triangulation is able to operate on objects thatconfuse traditional stereo methods, such as transparent and reflective objects. Symmetrytriangulation is especially useful for surface of revolution objects as the resulting symme-try axis is identical to the axis of revolution. This allows accurate localization of commonhousehold objects such as cups and bottles. Additionally, the symmetry axis providesuseful information with regards to an object’s orientation and provide structural cues thatare useful for object manipulation.

To deal with dynamic objects that move within the robot’s environment, a real timeobject tracker is proposed in Chapter 5. The C++ implementation of the tracker is ableto operate at 40 frames per second on 640 × 480 video. The tracker also provides asymmetry-refined motion segmentation of the tracked object in real time. Prior to thiswork, object tracking using bilateral symmetry has not been attempted. With regards toobject tracking in general, the level of robustness against affine transformations, occlusionsand object transparency exhibited by the symmetry tracker also contributes to the stateof the art. A quantitative analysis of symmetry tracking errors indicates that bilateralsymmetry is a robust and accurate object feature. Additionally, a qualitative comparisonbetween symmetry and colour suggests that the edge-based bilateral symmetry can beused in a complementary manner with other tracking features.

171



Chapter 6 proposed an autonomous system that makes use of the object motion induced bya precise robotic nudge to perform object segmentation. While most active vision systemsactuate an eye-in-hand camera to gain multiple views of a static scene, the proposed roboticsystem actuates a target object to obtain segmentations that are true to the physical world.This active approach allows the segmentation of new objects by using bilateral symmetryto triangulate and track objects.

The proposed autonomous segmentation approach is partly inspired by [Fitzpatrick, 2003a],which describes an accidental approach to object discovery and segmentation using a pok-ing action. The proposed approach differs from previous work, intentionally causing objectmotion by performing planning prior to robotic action. The planning enables the use of agentle nudge action that generates predictable object motion, which allows the segmenta-tion and actuation of fragile and top-heavy objects. This differs from previous work thatrequires the use of unbreakable test objects due to the high speed of contact between theend effector and the test objects.

Fitzpatrick’s approach sometimes produces poor segmentation results that are near-emptyor includes the robot’s end effector. In the proposed approach, such results are preventedby the use of stereo object tracking initiated upon detection of object motion. The pro-posed motion segmentation method uses video images before and after object actuation toprevent the inclusion of the robot’s end effector in the result. Additionally, the proposedmotion segmentation approach is fast, requiring only 80ms to perform a 1280× 960 pixelmotion segmentation. This allows the robot to continue its online functions immediatelyafter object actuation, removing temporal gaps in sensing caused by offline processing.


The robot described in Chapter 7 performs autonomous object learning by integrating theresearch in previous chapters with new visual sensing and object manipulation techniques.By leveraging the robotic nudge to perform autonomous object segmentation, the robotcollects training data on its own by grasping and rotating a nudged object. Object learningis performed by building a 360-degree visual model of grasped objects using SIFT [Lowe,2004]. Experiments show that the challenge of shifting the burden of object modellingfrom the human user to the robot has been met for bilaterally symmetric objects.

In addition to meeting the challenge of autonomous object learning, the research alsoprovides the following contributions. Firstly, the work shows that it is possible to leveragea simple object manipulation to perform a more advanced manipulation. This is embodiedby the use of the robotic nudge to obtain the necessary information for object grasping,which is a more complex and difficult manipulation. Secondly, the work shows that a robotcan collect training data of sufficient quality and quantity to perform object modelling.Specifically, the robot is able to build a panoramic visual model of grasped objects using

172

Section 8.2. Future Work

SIFT descriptors without having to rely on symmetry or any other trackable feature.Finally, experiments show that the robot is able to perform robust object recognition usingautonomously learned object models. This finding confirms that the robot is able to learnuseful object models on its own, which is a first step towards intelligent robotic systemsthat can adapt to dynamic domestic environments by learning new objects autonomously.

8.2 Future Work

The work covered in this thesis spans the areas of symmetry detection, model-free objectsensing, autonomous object manipulation and object learning through robotic interaction.Given the wide scope of the thesis, there are many avenues for future research in each ofthese areas. The future work covered here is primarily focused on encouraging moreresearch on robotic systems that learn by acting.

Fast Bilateral Symmetry Detection

The bilateral symmetry detector proposed in Chapter 3 is fast and robust enough to oper-ate on real world video. However, the old adage of Garbage in, Garbage out still applies.Without edge pixels that correspond closely to the physical boundaries of objects, sym-metry detection will invariably fail to find the symmetry lines of objects. In practice, thisproblem usually manifests as broken edge contours. It maybe possible to use approachessuch as gradient vector flow (GVF) [Xu and Prince, 1998] to fill in empty space betweensegments of an edge contour. As GVF has been used successfully to detect symmetryin contours of dots [Prasad and Yegnanarayana, 2004], it seems likely that a vector fieldapproach can be used to improve symmetry detection robustness against broken contours.As vector fields and similar approaches are computationally expensive, the real time per-formance of detection will probably be reduced by this kind of preprocessing.

Similarly, an excess of edge pixels that do not belong to object contours may lower thesignal-to-noise ratio to the point of detection failure. This is especially problematic whenthe scene contains large patches of high frequency texture. Blurring the image prior to edgedetection will help alleviate the problem. However, blurring is a double edged sword as itwill also prevent the detection of edge pixels at low contrast boundaries. It maybe possibleto remedy the situation by using an edge detection approach that performs structure-preserving noise reduction such as SUSAN [Smith and Brady, 1997]. Additionally, theedge thinning approach employed in SUSAN [Smith, 1995] will help equalize the numberof votes cast by edge contours of different thicknesses.

Symmetry-Based Model-Free Object Sensing

While the model-free object sensing methods proposed in Chapters 4 and 5 are self-contained, they also highlight several issues worthy of further investigation. Firstly, the

173


pendulum experiments in Section 5.4.3 has provided some insight into colour and symme-try as tracking features. The observations that object blurring negatively affects symmetrydetection suggests the possibility of using colour features to improve symmetry trackingperformance. On the other hand, background distracters similar in hue to the trackingtarget are severe problems for colour tracking. It maybe possible to apply symmetrysynergetically to improve colour tracking accuracy and reliability.

Secondly, the comparison results in Section 4.3.6 highlights the different natures of tradi-tional stereo approaches and the proposed symmetry triangulation method. While denseand sparse stereo methods localize three dimensional points on an object’s surface, the pro-posed symmetry method returns a symmetry axis that passes through the object. Apartfrom the benefit of being able to deal with objects with transparent or reflective surfaces,fusing the results of symmetry and traditional stereo methods may provide additional ben-efits. For example, a symmetry axis may provide information with regards to an object’sorientation that help improve the search for correspondences by narrowing geometric con-straints. Similarly, an object’s symmetry axis may benefit higher level model fitting byconstraining the geometry of possible solutions.

Autonomous Object Segmentation using the Robotic Nudge

As discussed in Chapter 6, the use of robotic action to actuate objects for the purposes ofdiscovery and segmentation is rare amongst existing literature. Hopefully, the proposedautonomous segmentation method will motivate future works that depart from the de factoapproach of camera actuation to approaches that physically interact with objects. Withregards to individual steps of the proposed segmentation process, many future directionsare available.

Recall that the proposed segmentation process begins with the sensing of interesting loca-tions by stereo triangulation of symmetry axes over multiple video images. This step relieson finding the intersection point between a symmetry axis and a table plane to localizeobjects. Instead of requiring a priori knowledge of the table plane, a more adaptive ap-proach would be to estimate the geometry of the table online by fitting horizontal planesto dense stereo disparity.

As the test scenes used in the autonomous segmentation experiments only contain a sin-gle symmetric object, multi-object scenarios should be explored further. The proposedmethod performs a planned action in the form of the robotic nudge, which allows theimplementation of exploration strategies such as the previously suggested approach of in-vestigating locations near the camera first. The evaluation of exploration strategies canbe based on their efficiency as measured by the number of nudges required to completelydisambiguate the correspondences between symmetry axes and actual physical objectsin a scene. Additionally, the object nudging strategy can be optimized for maintainingmaximum object spacing and to keep objects within the robot manipulator’s workspace.

174

Section 8.2. Future Work

Implementing these kinds of exploration strategies will require lower level processes suchas obstacle avoidance and end effector path planning. These low level processes must dealwith the perpetual 2.5D problem of stereo vision where objects are occluded in one cameraview but not the other. Also, in scenes where objects are not spatially sparse enough toallow robust visual sensing, it maybe possible to use the robotic nudge primarily as a toolto remove occlusions by pushing the occluding object out of the way.

The robotic nudge relies solely on visual feedback to determine whether an object has beenactuated. Force sensors incorporated directly into the robot’s end effector will provide ad-ditional sensory confirmation upon effector-object contact. This will allow greater controlover object actuation. For example, the robotic nudge can be stopped a short time afterthe force sensors indicate that the end effector has made contact with an object insteadof the current fixed-length nudge trajectory. Moreover, a maximum force threshold willhelp prevent damage to the robot manipulator by aborting the robotic nudge when therobot encounters an immovable or heavy object. Additionally, the forces measured duringa nudge can be used as an object feature and as a feedback signal to modulate the strengthof the nudge.

On a global level, the generalization of the segmentation process to asymmetric objectsappears to be a highly useful future direction. Other structural features, such as geometricprimitives extracted from dense stereo disparity, may allow the use of a robotic nudgeapproach of motion segmentation. Instead of using the symmetry line to measure objectmovement for motion segmentation, other features should be evaluated. For example, setsof SIFT features may allow the recovery of the object’s pose in the video images takenbefore and after the nudge. As the motion segmentation approach will have to be modifiedwhen dealing with asymmetric objects, it may be worth departing from monocular methodsfor stereo methods such as using a stereo optical flow approach.


Chapter 7 proposed an autonomous object learning system specifically targeted at sym-metric objects, with experiments carried out on beverage bottles of various shapes andappearances. The system leverages the robotic nudge to grasp an object and then visuallymodel the grasped object. Grasping is performed by applying force towards an object’ssymmetry axis in a perpendicular direction. This produces a stable grasp for a surface ofrevolution object as its axis of revolution is the same as its symmetry axis. An avenuefor future work is to extend the object grasping step to other symmetric objects such asboxes, followed by further generalization to handle asymmetric objects. Additionally, aswith the robotic nudge, object grasping will benefit from manipulator path planning thataddresses the problem of obstacle avoidance.

After grasping an object, the robot builds an object model by rotating the grasped objectwhile taking images at fixed angular intervals. Instead of the current approach whereSIFT descriptors are rejected by examining their frequency of occurrence in an object’s

175


set of training images, other approaches may provide greater flexibility. For example, anew training image of the grasped object can be captured when most descriptors from theprevious training image no longer matches with descriptors in the current video image.

Additionally, work can be done to improve the quality of robot-collected training data byperforming intelligent manipulation of the grasped object. This concept is exemplified bya paper published during the writing of this thesis [Ude et al., 2008]. The work focusedon the sensory-motor task of maintaining an object at the centre of the camera imagewhile minimizing change in visual scale during object modelling, which is carried out byrotating the object in front of a camera.

Overall, as autonomous object learning via robotic interaction is a new area of research,there are many possibilities for future work. With regards to object interaction, theproblem of nudging and grasping objects that contain liquid, such as a cup of coffee, isparticularly pertinent to domestic robotics. Also, the approach of learning via objectinteraction can be generalized to other actors, both robotic and human. For example, ahuman teacher can perform a nudge to actuate objects placed in front of a passive visionsystem. Extending this concept further, it maybe possible to have a robot that learnsabout objects by observing the way humans interact with objects, such as during dinner.This will provide the robot with many opportunities to perform motion segmentationof moving objects. After sufficient observations, the robot can proceed to interact withpreviously segmented objects autonomously.

176

AMultimedia DVD Contents

A multimedia DVD containing videos, images and experiment data accompanies this dis-sertation. The contents of the DVD are related to the research in Chapters 5, 6 and 7.Detailed discussions of the multimedia content can be found in their respective chapters.

If video playback problems occur, the author recommends the cross platform and opensource video player VLC. A Windows XP installer for VLC is available in the root folderof the multimedia DVD. Users of non-Windows operating systems can download VLC fortheir platform from www.videolan.org/vlc/. The multimedia content provided on theDVD are as follows.

A.1 Real Time Object Tracking

Videos of the tracking results discussed in Section 5.3 are available from the tracking

folder. The tracking videos are available as WMV and H264 files from their respectivefolders. The H264 videos have better image quality than the WMV videos but requiremore processing power to decode.

Note that the tracking videos are also available from the following website:• www.ecse.monash.edu.au/centres/irrc/li_iro2006.php

Additionally, all four sets of 1000 pendulum test video images are available from thependulum folder of the multimedia DVD. Consult the readme.txt text file in the folderfor additional information.

A.2 Autonomous Object Segmentation

The autonomous object segmentation results discussed in Section 6.5 are located in thenudge folder. The object segmentation results are tabulated in index.html. Each row ofthe table contains an image of the segmentation result alongside corresponding videos of

177

www.videolan.org/vlc/


CHAPTER A. MULTIMEDIA DVD CONTENTS

stereo symmetry tracking and the robotic nudge as filmed from an external video camera.Note that the full resolution images are displayed by clicking on an image.

The segmentation results and videos are also available online:• www.ecse.monash.edu.au/centres/irrc/li_iro08.php

A.3 Object Learning by Interaction

The experimental results of the research presented in Chapter 7 are available from thelearning folder of the multimedia DVD. Object interaction videos showing the roboticnudge followed by a grasp and rotate of the nudge object are located in the learn-

ing/grasp folder. These videos are named after the object captions in Figure 7.9. Thelong pause between the nudge and the grasp in these videos is due to the saving of exper-iment data to document the learning process. This includes the writing of 200 640× 480tracking images to disc, which takes a considerable amount time. The data collected bythe robot during the learning experiments, including the training images, are availablefrom the learning/data folder. The readme.txt text file in the folder provides furtherdetails about the experiment data.

The object recognition results are available from the learning/sift folder. The results areorganized into folders according to the test object in the input image. The input imagesare named inputXX.png where XX is a number from 00 to 03. The training image thatproduced the best matching SIFT descriptor set is named databaseXX.png, where XXcorresponds to the input image number. The object recognition results are visualized inthe images named matchXX.png. Again, the XX at the end of the image file name is thesame as the corresponding input image number. These images are a vertical concatenationof the input image and the best matching training image. The input image is shown abovethe matching training image. SIFT descriptor matches are linked using red lines in theimage. The label of the object identified by the robot’s object recognition system is shownas green text at the bottom of the image.

Two videos explaining the autonomous grasping of nudged objects are available in thelearning/explain folder of the multimedia DVD. Both videos include audio commentaryby the thesis author. In combination, these two explanation videos demonstrate thatthe robot can autonomously grasp symmetric objects of different heights. The blue cup

video contains a demonstration where a blue cup is nudged and then grasped autonomousby the robot. The video also shows the graphic user interface of the robotic system,which provides real time visualization of the planned nudge, object segmentation andother pertinent information. The white bottle video shows the robot nudging, graspingand rotating a white bottle. The saving of experiment data such as tracking images hasbeen reduced in this video, which allows the robot to perform the learning process withminimal delay. The object segmentation obtained using the robotic nudge is shown in realtime during this video.

178


BBuilding a New Controller for the PUMA 260

B.1 Introduction

The author designed and implemented a stand-alone motion controller for the PUMA 260robot manipulator during the course of thesis research. The new controller is a completereplacement for the default Unimate controller. The controller drives the servo motors,controls the magnetic brake, reads the encoders and also interfaces with external hardwaresuch as the two-fingered gripper used for object grasping. The new controller differs fromthe PUMA 560 controller detailed in [Moreira et al., 1996], which only attempts to replacethe control logic while retaining the analog servo and amplifier modules of the Unimatecontroller.

The new controller was motivated by two factors. Firstly, the aging Unimate controllerin the author’s laboratory is prone to overheating, which can lead to unreliable operationduring robotic experiments especially in hot weather. Secondly, the Unimate controllerallows real time PC control of the PUMA arm but demands the constant sending ofencoder positions via a serial port. This removes vital CPU cycles from visual processing,especially when context switching between the arm and vision threads are considered. Thenew controller allows the specification of long motion trajectories that are carried out ona PCI motion control card so that no CPU cycles are taken away from time-critical taskssuch as real time object tracking during a robotic nudge.

B.2 System Overview

Figure B.1 shows the software and hardware components of the robot used for the exper-iments in Chapters 6 and 7. Components of the new controller are shown in blue. Thetwo kinematics modules in bold are described in Section B.3.

Advance Motion Controls (AMC) Z6A6DDC PWM servo amplifiers are used to drive the40V servo motors in the PUMA 260 robot arm. Six amplifiers are mounted in pairs onAMC MC2XZQD 2-axis interface boards, shown one above the other on the center-right

179

CHAPTER B. BUILDING A NEW CONTROLLER FOR THE PUMA 260

Galil PCI Servo ControllerFast

Symmetry

Detection

Object

Segmentation

Real Time

Object Tracking

Stereo

Triangulation

Visual Processing

Desktop PC

Inverse

Kinematics

Servo

Amplifier Pair Interconnect

Modules

Controller

Video Frames

Video Camera Video Camera

Stereo Camera Pair

PUMA 260

Manipulator

Ottobock Gripper

Robot Arm

Joint Angles

PWM Signals

Robotic Actions: Nudge, Grasp etc...

Direct

Kinematics

Joint Angles

Encoder Counts

Robot Arm PoseWrist Pose

Object

Pose

Servo and

Gripper Power

Encoder

Counts

Figure B.1: Overview of new robot arm controller.

of Figure B.2. Each green amplifier interface board is roughly the size of two credit cards.The two long black modules on the left of Figure B.2 are interconnects that interface theservo amplifiers, arm encoder signals, magnetic brake and robot gripper with the PCI servocontroller card. The amplifiers and brake are powered using a 40V 12A power supply.

Figure B.2: New stand-alone controller for the PUMA 260.

A Galil six-axis DMC1860 servo motion control card acts as the brains of the controller.The robot’s software interfaces with the card via a C API by giving commands written inGalil’s proprietary motion code. The PCI card provides a hardware PID control system

180

Section B.3. Kinematics

for each axis, allowing individual and collective motion control for all six robot arm axes.Single motor actions such as moving the robot arm out of its cradle position are given usingthe position absolute (PA) command, which drives motors using a trapezoidal velocityprofile to a desired location. More complex movements such as the robotic nudge areperformed by generating a list of waypoints that specify the desired motion trajectory,storing this list on the controller’s onboard memory and then issuing a linear interpolation(LM) command, which tells the controller to drive the robot arm through the waypoints.

B.3 Kinematics

Software written by the author in C++ is used to perform direct and inverse kinematics.Direct kinematics is used to determine the position and orientation, also called the pose, ofthe wrist based on the joint angles of the robot manipulator. Conversely, inverse kinematicsis used to generate the joint angles necessary to achieve a target wrist pose. An additionalhomogenous transformation between the wrist and the end effector is used to achievea target end effector pose for tasks such as the robotic nudge or object grasping. Asthe robotic experiments presented in this thesis do not require high manipulator speed,velocity control using Jacobians and the modelling of rigid body dynamics are left forfuture work.

A gentle introduction to robot manipulator kinematics is available from Chapter 4 of[Craig, 2005]. Chapter 2 of [Tsai, 1999] details the mathematics of serial manipulatorkinematics that includes the popular Denavit-Hartenberg method. Inverse kinematicsand motion control for manipulators with redundant degrees of freedom is surveyed in[Siciliano, 1990]. The kinematics described here are developed by modifying the link andjoint parameters of a PUMA 560 solution [Paul and Zhang, 1986] to adapt the solutionto the PUMA 260. Simulations were carried out using Peter Corke’s Robotics Toolbox[Corke, 1996] prior to implementing the kinematics as C++ software on the robot.

B.3.1 PUMA 260 Physical Parameters

Kinematic calculations require the physical parameters of the robot manipulator. Whilethese parameters are readily available for the PUMA 560 from publications such as [Leeand Ziegler, 1984] and [Lee, 1982], published records for the PUMA 260 are quite rare. Thedistance between joints are provided in a book about real time vision and manipulatorcontrol [Andersson, 1988], where a PUMA 260 is adapted to play ping pong. Thanksto Giorgio Metta of LIRA-Lab at Genova University, Italy, the author was also directedtowards a real time control library from McGill University [Lloyd, 2002]. The physicalparameters of the PUMA 260 used in the kinematic calculations to follow are taken fromthe latter source.

181


The distance between the shoulder joint and the elbow joint is 0.2032 metres. The ma-nipulator’s forearm also has the same length. The distance offset between the shoulderjoint and the wrist axis along the forearm is 0.12624 metres. In Denavit-Hartenberg (DH)notation, with the same link numbering convention used in Table 1 of [Paul and Zhang,1986], the PUMA 260 link and joint parameters are shown in Table B.1. Notice that unlikethe PUMA 560 DH table, the PUMA 260 has a zero d value for link 3.

Table B.1: PUMA 260 link and joint parameters

Link number α (degrees) a d1 90 0 02 0 0.2032 03 -90 0 0.126244 90 0 0.20325 -90 0 06 0 0 0

B.3.2 Direct Kinematics

Also known as forward kinematics, direct kinematics is used to determine the position andorientation of the manipulator wrist from known joint angles. As mentioned earlier, thedirect kinematics solution is based on the PUMA 560 kinematics described in [Paul andZhang, 1986]. PUMA 560 direct kinematics is also briefly discussed in a tutorial formaton page 78 to 83 of [Craig, 2005]

Direct kinematics performs 3D homogenous transformations from one joint to the next,moving from the shoulder to the wrist. The matrix Ti represents the transformation acrossa single link. The αi, ai and di values are the link and joint parameters for the ith link inTable B.1. θi is the joint angle for link i, which is the input data to direct kinematics.

Ti =

cos(θi) − sin(θi) cos(αi) sin(θi) cos(αi) ai cos(θi)sin(θi) cos(θi) cos(αi) − cos(θi) sin(αi) ai sin(θi)

0 sin(αi) cos(αi) di

0 0 0 1

(B.1)

The Ti matrix is calculated for all six links of the PUMA 260. After this, the position andorientation of the manipulator wrist can be calculated using the transformation matrixTw, which gives the transformation from the manipulator shoulder joint to the wrist. Tw

is the output of direct kinematics.

Tw = T1T2T3T4T5T6 (B.2)

182

Section B.3. Kinematics

B.3.3 Inverse Kinematics

Inverse kinematics is the problem of calculating the robotic manipulator joint angles nec-essary to get the wrist to a target position and orientation, also known as the wrist pose.As opposed to the relatively straightforward mathematics of direct kinematics, inversekinematics can be difficult for manipulators with high degrees of freedom such as thePUMA 260. Firstly, the existence of solutions must be tested mathematically to ensurethat the required wrist pose is reachable within the manipulator’s workspace. Secondly,due to the redundancies in joint actuation provided by the six-jointed PUMA 260, theproblems of multiple correct solutions and singularities when moving between solutionsmust be addressed.

The spherical wrist of the PUMA 260 help simplify the inverse kinematics calculations,as the first three joints control the position of the wrist while the wrist joints control theorientation. The input to inverse kinematics is the desired 3D homogenous transforma-tion from the base shoulder joint of the manipulator to its wrist. Defining the desiredtransformation matrix as Td, elements within the matrix are labelled as follows.

Td =

t11 t21 t31 t41

t12 t22 t32 t42

t13 t23 t33 t43

t14 t24 t34 t44

(B.3)

The desired (Px, Py, Pz) position of the manipulator wrist is represented by t41, t42 andt43 respectively. Two three-element orientation vectors are defined as O = [t21 t22 t23]

T

and A = [t31 t32 t33]T . The elements of O from top to bottom are named Ox, Oy and Oz.

The same naming convention is used for the elements of A.

The following inverse kinematics solution puts the manipulator in a right handed, elbow upand wrist no-flip configuration as described in [Paul and Zhang, 1986]. A similar namingconvention to the solution of Paul and Zhang is used below. Again, αi, ai and di are thephysical parameters of the ith link taken from Table B.1.

Joint 1 Angle (θ1)

r =√Px

2 + Py2

θ1 = tan−1

(Py

Px

)+ sin−1

(d3

r

)(B.4)

183


Joint 2 Angle (θ2)

V114 = Px cos θ1 + Py sin θ1

ψ = cos−1

a22 − d4

2 + (V114)2 + Pz2

2a2

(√(V114)2 + Pz

2

)

θ2 = tan−1

(Pz

V114

)+ ψ (B.5)

Joint 3 Angle (θ3)

θ3 = tan−1

(V114 cos θ2 + Pz sin θ2 − a2

Pz cos θ2 − V114sinθ2

)(B.6)

Joint 4 Angle (θ4)

V113 = Ax cos θ1 +Ay sin θ1

V323 = Ay cos θ1 −Ax sin θ1

V313 = V113 cos (θ2 + θ3) +Az sin (θ2 + θ3)

θ4 = tan−1

(V323

V313

)(B.7)

Joint 5 Angle (θ5)

θ5 = tan−1

(V313 cos θ4 + V323 sin θ4

V113 sin (θ2 + θ3)−Az cos (θ2 + θ3)

)(B.8)

Joint 6 Angle (θ6)

V112 = Ox cos θ1 +Oy sin θ1

V132 = Ox sin θ1 −Oy cos θ1

V312 = V112 cos (θ2 + θ3) +Oz sin (θ2 + θ3)

V332 = −V112 sin (θ2 + θ3) +Oz cos (θ2 + θ3)

V412 = V312 cos θ4 − V132 sin θ4

V432 = V312 sin θ4 + V132 cos θ4

θ6 = tan−1

(V412 cos θ5 + V332 sin θ5

V432

)(B.9)

184

References

[Andersson, 1988] Russell L. Andersson. A Robot Ping-Pong Player: Experiment in Real-Time Intelligent Control. The MIT Press, 1988.

[Atallah, 1985] M.J. Atallah. On symmetry detection. IEEE Transactions on Computers,34:663–666, 1985.

[Ballard, 1981] Dana H. Ballard. Generalizing the hough transform to detect arbitraryshapes. Pattern Recognition, 13(2):111–122, 1981.

[Bar-Shalom et al., 2002] Yaakov Bar-Shalom, Thiagalingam Kirubarajan, and X.-RongLi. Estimation with Applications to Tracking and Navigation. John Wiley & Sons, Inc.,2002.

[Barnes and Zelinsky, 2004] N. Barnes and A. Zelinsky. Fast radial symmetry speed signdetection and classification. In Proceedings of the IEEE Intelligent Vehicles Symposium,pages 566–571, Parma, Italy, June 2004.

[Blum and Nagel, 1978] Harry Blum and Roger N. Nagel. Shape description usingweighted symmetric axis features. Pattern Recognition, 10:167–180, 1978.

[Blum, 1964] Harry Blum. A transformation for extracting descriptors of shape. In Pro-ceedings of Meeting Held. MIT Press, November 1964.

[Blum, 1967] Harry Blum. A transformation for extracting descriptors of shape. In Pro-ceedings of Symposium on Models for the Perception of Speech and Visual Form, pages153–171, Cambridge, MA, USA, November 1967.

[Borgerfors, 1986] Gunilla Borgerfors. Distance tranfromations in digital images. Com-puter Vision, Graphics, and Image Processing, 34:344–371, 1986.

[Borras, 2006] Ricard Borras. Opencv cvblobslib binary blob extraction library. Online,November 2006.URL: http://opencvlibrary.sourceforge.net/cvBlobsLib.

[Bouguet, 2006] Jean-Yves Bouguet. Camera calibration toolbox for matlab. Online, July2006.URL: http://www.vision.caltech.edu/bouguetj/calibdoc/.

185

http://opencvlibrary.sourceforge.net/cvBlobsLib

http://www.vision.caltech.edu/bouguetj/calib doc/

[Bradski, 1998] Gary R. Bradski. Computer vision face tracking for use in a perceptualuser interface. Intel Technology Journal, Q2:214–219, 1998.

[Brady and Asada, 1984] Michael Brady and Haruo Asada. Smoothed local symmetriesand their implementation. Technical report, MIT, Cambridge, MA, USA, 1984.

[Brooks, 1981] Rodney A. Brooks. Symbolic reasoning among 3-d models and 2-d images.Artificial Intelligence, 17(1-3):285–348, 1981.

[Brown et al., 2003] M.Z. Brown, D. Burschka, and G.D. Hager. Advances in compu-tational stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence,25(8):993–1008, 2003.

[Buss et al., 2008] Martin Buss, Henrik Christensen, and Yoshihiko Nakamura, editors.IROS 2008 Workshop on Robot Services in Aging Society, Nice, France, September2008.

[Canny, 1986] J. Canny. A computational approach to edge detection. IEEE Transactionson Pattern Analysis and Machine Intelligence, 8(6):679–698, 1986.

[Cham and Cipolla, 1994] Tat-Jen Cham and Roberto Cipolla. Skewed symmetry de-tection through local skewed symmetries. In Proceedings of the conference on Britishmachine vision (BMVC), volume 2, pages 549–558, Surrey, UK, 1994. BMVA Press.

[Christensen, 2008] Henrik I. Christensen. Robotics as an enabler for aging in place. InRobot Services in Aging Society IROS 2008 Workshop, Nice, France, September 2008.

[Cole and Yap, 1987] Richard Cole and Chee-Keng Yap. Shape from probing. Journal ofAlgorithms, 8(1):19–38, 1987.

[Corke, 1996] Peter I. Corke. A robotics toolbox for matlab. IEEE Robotics and Automa-tion Magazine, 3(1):24–32, March 1996.URL: http://petercorke.com/Robotics%20Toolbox.html.

[Cornelius and Loy, 2006] Hugo Cornelius and Gareth Loy. Detecting bilateral symmetryin perspective. In Proceedings of Conference on Computer Vision and Pattern Recogni-tion Workshop, page 191, Los Alamitos, CA, USA, 2006. IEEE Computer Society.

[Craig, 2005] John J. Craig. Introduction to Robotics: Mechanics and Control. PearsonPrentice Hall, 2005.

[Duda and Hart, 1972] Richard O. Duda and Peter E. Hart. Use of the hough transforma-tion to detect lines and curves in pictures. Communications of the ACM, 15(1):11–15,1972.

[Dyer et al., 2005] Adrian G. Dyer, Christa Neumeyer, and Lars Chittka. Honeybee (apismellifera) vision can discriminate between and recognise images of human faces. Journalof Experimental Biology, 208:4709–4714, 2005.

186

http://petercorke.com/Robotics%20Toolbox.html

[Fitzpatrick and Metta, 2003] Paul Fitzpatrick and Giorgio Metta. Grounding visionthrough experimental manipulation. In Philosophical Transactions of the Royal So-ciety: Mathematical, Physical, and Engineering Sciences, pages 2165–2185, 2003.

[Fitzpatrick, 2003a] Paul Fitzpatrick. First contact: an active vision approach to seg-mentation. In Proceedings of Intelligent Robots and Systems (IROS), volume 3, pages2161–2166, Las Vegas, Nevada, October 2003. IEEE.

[Fitzpatrick, 2003b] Paul Fitzpatrick. From First Contact to Close Encounters: A Devel-opmentally Deep Perceptual System for a Humanoid Robot. PhD thesis, MassachusettsInstitute of Technology, 2003.

[Forsyth and Ponce, 2003] David A. Forsyth and Jean Ponce. Computer Vision - A Mod-ern Approach. Alan Apt, 2003.

[Gupta et al., 2005] Abhinav Gupta, V. Shiv Naga Prasad, and Larry S. Davis. Extractingregions of symmetry. In IEEE International Conference on Image Processing (ICIP),volume 3, pages 133–136, Genova, Italy, September 2005.

[Harris and Stephens, 1988] C. Harris and M. Stephens. A combined corner and edge de-tector. In Proceedings of The Fourth Alvey Vision Conference, pages 147–151, Manch-ester, UK, September 1988.

[Hough, 1962] P.V.C. Hough. Method and means for recognizing complex patterns. UnitedStates Patent, December 1962. 3,069,654.

[Huang et al., 2002] Yu Huang, Thomas S. Huang, and Heinrich Niemann. A region-based method for model-free object tracking. In International Conference on PatternRecognition, pages 592–595, Quebec, Canada, August 2002.

[Huebner, 2007] Kai Huebner. Object description and decomposition by symmetry hier-archies. In International Conferences in Central Europe on Computer Graphics, Visu-alization and Computer Vision, Bory, Czech Republic, January 2007.

[Illingworth and Kittler, 1988] J. Illingworth and J. Kittler. A survey of the hough trans-form. Computer Vision, Graphics, and Image Processing, 44(1):87–116, 1988.

[Intel, 2006] Intel. Opencv: Open source computer vision library. Online, November 2006.URL: http://www.intel.com/technology/computing/opencv/.

[Jia and Erdmann, 1998] Yan-Bin Jia and Michael Erdmann. Observing pose and motionthrough contact. In Proceedings of IEEE International Conferene on Robotics andAutomation, volume 1, pages 723–729, Leuven, Belgium, May 1998.

[K. S. Arun and Blostein, 1987] T. S. Huang K. S. Arun and S. D. Blostein. Least-squaresfitting of two 3-d point sets. IEEE Transactions on Pattern Analysis and MachineIntelligence, 9:698–700, 1987.

187

http://www.intel.com/technology/computing/opencv/

[Kiryati and Gofman, 1998] Nahum Kiryati and Yossi Gofman. Detecting symmetry ingrey level images: The global optimization approach. International Journal of ComputerVision, 29(1):29–45, 1998.

[Kleeman, 1996] Lindsay Kleeman. Understanding and applying kalman filtering. In Pro-ceedings of the Second Workshop onPerceptive Systems, Curtin University of Technology,Perth, Western Australia, January 1996.

[Koch et al., 2006] Kristin Koch, Judith McLean, Ronen Segev, Michael A. Freed,Michael J. Barry, Vijay Balasubramanian, and Peter Sterling. How much the eye tellsthe brain. Current Biology, 16:1428–1434, 2006.

[Kovesi, 1997] Peter Kovesi. Symmetry and asymmetry from local phase. In Tenth Aus-tralian Joint Conference on Artificial Intelligence, pages 185–190, Perth, Australia,December 1997.

[Laurie J. Heyer and Yooseph, 1999] Semyon Kruglyak Laurie J. Heyer and ShibuYooseph. Exploring expression data: Identification and analysis of coexpressed genes.Genome Research, 9:1106–1115, 1999.

[Lee and Ziegler, 1984] C.S.G. Lee and M. Ziegler. Geometric approach in solving inversekinematics of puma robots. IEEE Transactions on Aerospace and Electronic Systems,6:695–706, 1984.

[Lee et al., 2001] Bin Lee, Jia-Yong Yan, and Tian-Ge Zhuang. A dynamic programmingbased algorithm for optimal edge detection in medical images. In Proceedings of theInternational Workshop on Medical Imaging and Augmented Reality, pages 193–198,Hong Kong, China, June 2001.

[Lee, 1982] C S G Lee. Robot arm kinematics, dynamics, and control. Computer,15(12):62–80, 1982.

[Lei and Wong, 1999] Y. Lei and K.C. Wong. Detection and localisation of reflectional androtational symmetry under weak perspective projection. Pattern Recognition, 32(2):167–180, February 1999.

[Levitt, 1984] Tod S. Levitt. Domain independent object description and decomposition.In National Conference on Artificial Intelligence (AAAI), pages 207–211, Austin, Texas,USA, August 1984.

[Li and Kleeman, 2006a] Wai Ho Li and Lindsay Kleeman. Fast stereo triangulation usingsymmetry. In Australasian Conference on Robotics and Automation, Auckland, NewZealand, December 2006. Online.URL: http://www.araa.asn.au/acra/acra2006/.

[Li and Kleeman, 2006b] Wai Ho Li and Lindsay Kleeman. Real time object tracking usingreflectional symmetry and motion. In IEEE/RSJ Conference on Intelligent Robots andSystems, pages 2798–2803, Beijing, China, October 2006.

188

http://www.araa.asn.au/acra/acra2006/

[Li and Kleeman, 2008] Wai Ho Li and Lindsay Kleeman. Autonomous segmentation ofnear-symmetric objects through vision and robotic nudging. In 2008 IEEE/RSJ Inter-national Conference on Intelligent Robots and Systems, pages 3604–3609, Nice, France,September 2008.

[Li et al., 2005] Wai Ho Li, Alan M. Zhang, and Lindsay Kleeman. Fast global reflectionalsymmetry detection for robotic grasping and visual tracking. In Claude Sammut, editor,Australasian Conference on Robotics and Automation, Sydney, December 2005. Online.URL: http://www.cse.unsw.edu.au/~acra2005/.

[Li et al., 2006] Wai Ho Li, Alan M. Zhang, and Lindsay Kleeman. Real time detectionand segmentation of reflectionally symmetric objects in digital images. In IEEE/RSJConference on Intelligent Robots and Systems, pages 4867–4873, Beijing, China, October2006.

[Li et al., 2008] Wai Ho Li, Alan M. Zhang, and Lindsay Kleeman. Bilateral symmetrydetection for real-time robotics applications. International Journal of Robotics Research,27(7):785–814, July 2008.

[Lloyd, 2002] John Lloyd. Robot control c library. Online, 2002.URL: http://www.cs.ubc.ca/~lloyd/rccl.html.

[Lowe, 2004] David G. Lowe. Distinctive image features from scale-invariant keypoints.International Journal of Computer Vision, 60(2):91–110, November 2004.

[Loy and Eklundh, 2006] Gareth Loy and Jan-Olof Eklundh. Detecting symmetry andsymmetric constellations of features. In Proceedings of European Conference on Com-puter Vision (ECCV), Graz, Austria, May 2006.

[Loy and Zelinsky, 2003] Gareth Loy and Alexander Zelinsky. Fast radial symmetry fordetecting points of interest. IEEE Transactions on Pattern Analysis and Machine In-telligence, 25(8):959–973, 2003.

[Loy, 2003] Gareth Loy. Computer Vision to See People: a basis for enhanced humancomputer interaction. PhD thesis, Australian National University, January 2003.

[Lucas and Kanade, 1981] Bruce D. Lucas and Takeo Kanade. An iterative image regis-tration technique with an application to stereo vision. In International Joint Conferenceon Artificial Intelligence, pages 674–679, Vancouver, British Columbia, Canada, April1981.

[Makram-Ebeid, 2000] Sherif Makram-Ebeid. Digital image processing method for auto-matic extraction of strip-shaped objects. United States Patent, October 2000. Number6134353,URL: http://www.freepatentsonline.com/6134353.html.

[Marola, 1989a] G. Marola. On the detection of the axes of symmetry of symmetric andalmost symmetric planar images. IEEE Transactions on Pattern Analysis and MachineIntelligence, 11(1):104–108, 1989.

189

http://www.cse.unsw.edu.au/~acra2005/

http://www.cs.ubc.ca/~lloyd/rccl.html

http://www.freepatentsonline.com/6134353.html

[Marola, 1989b] Giovanni Marola. Using symmetry for detecting and locating objects ina picture. Computer Vision, Graphics and Image Processing, 46(2):179–195, 1989.

[Matas et al., 2002] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baselinestereo from maximally stable extremal regions. In Proceedings of the British MachineVision Conference (BMCV), pages 384–393, Tbingen, Germany, November 2002.

[Merlet, 2008] J.P. Merlet. Statistics for iros 2008 and proposals for improving the reviewprocess. Online PDF, August 2008.

[Mikolajczyk and Schmid, 2005] Krystian Mikolajczyk and Cordelia Schmid. A perfor-mance evaluation of local descriptors. IEEE Transactions on Pattern Analysis andMachine Intelligence, 27(10):1615–1630, October 2005.

[Mikolajczyk et al., 2005] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman,J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine regiondetectors. International Journal of Computer Vision, 65(1/2):43–72, 2005.

[Mohan and Nevatia, 1992] Rakesh Mohan and Ramakant Nevatia. Perceptual organiza-tion for scene segmentation and description. IEEE Transactions on Pattern Analysisand Machine Intelligence, 14:616–635, 1992.

[Moll and Erdmann, 2001] Mark Moll and Michael A. Erdmann. Reconstructing shapefrom motion and tactile sensors. In Proceedings of International Conference on Intelli-gent Robots and Systems (IROS), Maui, HI, USA, October 2001.

[Moller, 1997] Tomas Moller. A fast triangle-triangle intersection test. Journal of GraphicsTools, 2(2):25–30, 1997.URL: http://www.cs.lth.se/home/Tomas_Akenine_Moller/.

[Moreira et al., 1996] Nuno Moreira, Paulo Alvito, and Pedro Lima. First steps towardsan open control architecture for a puma 560. In Proceedings of CONTROLO Conference,1996.

[Mortensen et al., 1992] E. Mortensen, B. Morse, W. Barrett, and J. Udupa. Adaptiveboundary detection using live-wire two-dimensional dynamic programming. In IEEEProceedings of Computers in Cardiology, pages 635–638, Durhman, North Carolina,October 1992.

[Nagel, 1978] H. H. Nagel. Formation of an object concept by analysis of systematictime variations in the optically perceptible environment. Computer Graphics ImageProcessing, 7(2):149–194, April 1978.

[Nalwa, 1988a] Vishvjit S. Nalwa. Line-drawing interpretation: A mathematical frame-work. International Journal of Computer Vision, 2(2):103–124, September 1988.

[Nalwa, 1988b] Vishvjit S. Nalwa. Line-drawing interpretation: Straight lines and conicsections. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(4):514–529, 1988.

190

http://www.cs.lth.se/home/Tomas_Akenine_Moller/

[Nalwa, 1989] Vishvjit S. Nalwa. Line-drawing interpretation: Bilateral symmetry. IEEETransactions on Pattern Analysis and Machine Intelligence, 11(10):1117–1120, 1989.

[Nevatia and Binford, 1977] Ramakant Nevatia and Thomas O. Binford. Description andrecognition of curved objects. Artificial Intelligence, 8:77–98, 1977.

[Ogawa, 1991] Hideo Ogawa. Symmetry analysis of line drawings using the hough trans-form. Pattern Recognition Letters, 12(1):9–12, 1991.

[Pal and Pal, 1993] N.R. Pal and S.K. Pal. A review on image segmentation techniques.Pattern Recognition, 26(9):1277–1294, September 1993.

[Paul and Zhang, 1986] Richard P. Paul and Hong Zhang. Computationally efficient kine-matics for manipulators with spherical wrists based on the homogeneous transformationrepresentation. International Journal of Robotics Research, 5:32–44, 1986.

[Pfaltz and Rosenfeld, 1967] John L. Pfaltz and Azriel Rosenfeld. Computer representa-tion of planar regions by their skeletons. Communications of the ACM, 10(2):119–122,1967.

[Ponce, 1990] Jean Ponce. On characterizing ribbons and finding skewed symmetries.Computer Vision, Graphics, and Image Processing, 52(3):328–340, December 1990.

[Prasad and Yegnanarayana, 2004] V. Shiv Naga Prasad and B. Yegnanarayana. Find-ing axes of symmetry from potential fields. IEEE Transactions on Image Processing,13(12):1559–1566, December 2004.

[Ray et al., 2008] Celine Ray, Francesco Mondada, and Roland Siegwart. What do peopleexpect from robots? In IEEE/RSJ International Conference on Intelligent Robots andSystems, pages 3816–3821, Nice, France, September 2008.

[Reisfeld et al., 1995] D. Reisfeld, H. Wolfson, and Y. Yeshurun. Context-free attentionaloperators: The generalized symmetry transform. International Journal of ComputerVision, Special Issue on Qualitative Vision, 14(2):119–130, March 1995.

[Rosenfeld and Pfaltz, 1966] Azriel Rosenfeld and John L. Pfaltz. Sequential operationsin digital picture processing. Journal of the Association for Computing Machinery,13:471–494, 1966.

[Rosenfeld and Pfaltz, 1968] Azriel Rosenfeld and John L. Pfaltz. Distance functions ondigital images. Pattern Recognition, 1:33–61, 1968.

[Satoh et al., 2004] Yoshinori Satoh, Takayuki Okatani, and Koichiro Deguchi. A color-based tracking by kalman particle filter. In Proceedings of International Conference onPattern Recognition, pages 502–505, Cambridge, UK, August 2004.

[Scharstein and Szeliski, 2001] Daniel Scharstein and Richard Szeliski. A taxonomy andevaluation of dense two-frame stereo correspondence algorithms. Technical Report MSR-TR-2001-81, Microsoft Research, Microsoft Corporation, November 2001.

191

[Siciliano, 1990] Bruno Siciliano. Kinematic control of redundant robot manipulators: Atutorial. Journal of Intelligent and Robotic Systems, 3:202–212, 1990.

[Skarbek and Koschan, 1994] Wladyslaw Skarbek and Andreas Koschan. Colour imagesegmentation — a survey. Technical report, Institute for Technical Informatics, Tech-nical University of Berlin, October 1994.

[Smith and Brady, 1997] Stephem M. Smith and J. Michael Brady. Susana new approachto low level image processing. International Journal of Computer Vision, 23:45–78,1997.

[Smith, 1995] S.M. Smith. Edge thinning used in the susan edge detector. TechnicalReport TR95SMS5, Oxford Centre for Functional Magnetic Resonance Imaging of theBrain (FMRIB), 1995.

[Swain and Ballard, 1991] Michael J. Swain and Dana H. Ballard. Color indexing. Inter-nation Journal of Computer Vision, 7(1):11–32, 1991.

[Taylor, 2004] Geoffrey Taylor. Robust Perception and Control for Humanoid Robots inUnstructured Environments using Vision. PhD thesis, Monash University, Melbourne,Australia, 2004.

[Tsai, 1999] Lung-Wen Tsai. Robot Analysis - The Mechanics of Serial and Parallel Ma-nipulators. John Wiley and Sons Inc, 1999.

[Ude et al., 2008] Ales Ude, Damir Omrcen, and Gordon Cheng. Making object learningand recognition and active process. International Journal of Humanoid Robotics, 5:267–286, 2008. Special Issue: Towards Cognitive Humanoid Robots.

[Viola and Jones, 2001] Paul Viola and Michael J. Jones. Rapid object detection usinga boosted cascade of simple features. In Proceedings of IEEE Computer Society Con-ference on Computer Vision and Pattern Recognition, Kauai Marriott, Hawaii, USA,December 2001.

[Wang and Suter, 2003] Hanzi Wang and David Suter. Using symmetry in robust modelfitting. Pattern Recognition Letters, 24(16):2953–2966, 2003.

[Wang and Suter, 2005] Hanzi Wang and David Suter. A re-evaluation of mixture-of-gaussian background modeling. In IEEE International Conference on Acoustics, Speech,and Signal Processing (ICASSP), pages 1017–1020, Pennsylvania, USA, March 2005.

[Westhoff et al., 2005] D. Westhoff, K. Huebner, and J. Zhang. Robust illumination-invariant features by quantitative bilateral symmetry detection. In Proceedings of theIEEE International Conference on Information Acquisition (ICIA), Hong Kong, June2005.

[Xu and Oja, 1993] L. Xu and E. Oja. Randomized hough transform (rht): Basic mech-anisms, algorithms, and computational complexities. Computer Vision, Graphics, andImage Processing, 57(2):131–154, March 1993.

192

[Xu and Prince, 1998] Chenyang Xu and Jerry L. Prince. Snakes, shapes, and gradientvector flow. IEEE Transactions on Image Processing, 7(7):359–369, March 1998.

[Yan and Kassim, 2004] P. Yan and A.A. Kassim. Medical image segmentation with min-imal path deformable models. In Proceedings of the International Conference on ImageProcessing (ICIP), volume 4, pages 2733–2736, Singapore, October 2004.

[Yip et al., 1994] Raymond K. K. Yip, Wilson C. Y. Lam, Peter K. S. Tam, and DennisN. K. Leung. A hough transform technique for the detection of rotational symmetry.Pattern Recognition Letters, 15(9):919–928, 1994.

[Yl-Jski and Ade, 1996] Antti Yl-Jski and Frank Ade. Grouping symmetrical structuresfor object segmentation and description. Computer Vision and Image Understanding,63(3):399–417, 1996.

[Yu and Luo, 2002] Ting Yu and Yupin Luo. A novel method of contour extraction basedon dynamic programming. In Proceedings of the 6th International Conference on SignalProcessing (ICSP), pages 817–820, Beijing, China, August 2002.

[Zabrodsky et al., 1993] H. Zabrodsky, S. Peleg, and D. Avnir. Completion of occludedshapes using symmetry. In Proceedings of IEEE Conference on Computer Vision PatternRecognition, pages 678–679, New York, NY, USA, June 1993.

[Zabrodsky et al., 1995] H. Zabrodsky, S. Peleg, and D. Avnir. Symmetry as a continuousfeature. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17:1154–1166, 1995.

193

object segmentation and modelling through robotic manipulation and observing bilateral symmetry

Documents

object learning

object motion

object feature

autonomous object segmentation

sensing objects

transparent objects

household objects

reective objects