real-time underwater 3d scene reconstruction … anwer-msc_thesis.pdf · real-time underwater 3d...
TRANSCRIPT
REAL-TIME UNDERWATER 3D SCENE RECONSTRUCTION USING
KINECT V2 TIME OF FLIGHT CAMERA
by
ATIF ANWER
A Thesis
Submitted to the Postgraduate Studies Program
as a Requirement for the Degree of
MASTER OF SCIENCE
ELECTRICAL AND ELECTRONIC ENGINEERING
UNIVERSITI TEKNOLOGI PETRONAS
BANDAR SERI ISKANDAR
PERAK
NOVEMBER 2017
v
DEDICATION
To my (late) mother and father, who’s love, prayers, efforts, wishes, wisdom and
support has made me into what and where I am today
To my brother, who’s words of motivation have encouraged me to pursue my
dreams
To my wife, who has supported me through times thick and thin
vi
ACKNOWLEDGEMENTS
In the name of Allah, the Most Beneficent and the Most Merciful. I would like to
thank Almighty Allah for giving me strength and resolve to accomplish this work with
due diligence and determination.
I would like to express my utmost gratitude to my supervisor, Dr Syed Saad Azhar
Ali, for his guidance, support and encouragement during my research. I am also
grateful for his never-ending dedication, motivation and cooperation throughout this
period.
I wish to express my sincere appreciation to my co-supervisor Prof. Fabrice
Mériaudeau, for all the invaluable suggestions, encouragement and enlightening
discussions throughout this time. I am grateful to him for enhancing my analytical
and research skills through his valuable advice and wisdom.
I am also thankful to all faculty members, colleagues and friends at Centre for
Intelligent Signal and Imaging Research (CISIR) for their support. Special mention
and thanks are due towards Dr Khurram Altaf, Amjad Khan, Abul Hassan, and
Sadam Shareen Abbasi for their help and support throughout this period.
Last, but certainly not the least, I would like to thank my parents, brother, wife,
friends and colleagues for their patience, never ending moral support and
encouragement during my study at UTP.
vii
ABSTRACT
Underwater 3D scene reconstruction is used to generate topographic maps and 3D
visualizations of sub-sea geological features and man-made structures. 3D
visualization provides the ability to easily envisage the environment beneath the water
line. However, underwater environment offers challenging conditions for 3D scene
reconstruction. Most of the existing solutions in use, such as LIDAR’s and sonars, are
specialized marine hardened equipment that are costly, bulky and are unable to
provide real time data processing.
This work presents the use of Microsoft Kinect, a commercial RGB-D camera,
for a small scale, economical and real-time underwater 3D scene reconstruction.
Kinect is operated while operating fully submerged underwater in a customized 3D
printed waterproof housing, and able to successfully acquire data at distances between
350 mm to 650 mm. RGB and Infrared cameras are calibrated, and the acquired time
of flight data is processed to cater for the errors in depth calculation due to the change
of imaging medium. A noise filter is applied to remove the noise in the point cloud
data, without significant loss of features. To accommodate the effects of refraction
due to the sensor housing and water, a fast, accurate and intuitive ray-casting based
refraction correction method has been developed that is applied to point clouds during
3D mesh generation. A software implementation including the noise filter, camera
calibration, time of flight and refraction correction algorithms has been developed
adapting Kinect Fusion SDK for underwater data processing. Data acquisition
experiments were done in controlled environments with both clear and turbid water
and a mean error of ±6 mm with an average stand deviation of 3 mm is achieved. A
complete dataset consisting of underwater 3D scans of objects has been developed
and released publicly. Areas such as coral reef mapping and underwater localization
and mapping for a robotic solution in shallow waters can benefit from the results
achieved by this research.
viii
ABSTRAK
Pembinaan semula pemandangan 3 dimensi (3D) bawah air digunakan untuk
menjana peta topografi dan visualisasi 3D ciri-ciri geologi dan struktur buatan
manusia bawah laut. Visualisasi 3D memberikan keupayaan untuk mudah
membayangkan persekitaran di bawah garisan air. Walaubagaimanapun, persekitaran
bawah air merupakan persekitaran yang mencabar bagi imbasan dan pembinaan
semula pemandangan 3D.
Kajian ini membentangkan penggunaan Microsoft Kinect, kamera komersial
RGB-D, untuk masa sebenar berskala kecil pemandangan 3D objek bawah air dan
imbasan 3D. Kinect beroperasi dengan cara merendamkannya di dalam air secara
penuh. Ia dilengkapi dengan perumah kalis air yang direka dan dihasilkan dengan
cetakan 3D dan mampu untuk memperolehi data pada jarak antara 350mm hingga
650mm dengan jayanya. Kamera RGB dan infra merah dikalibrasi, dan masa
pergerakan isyarat data diproses untuk menangani ralat dalam pengiraan kedalaman
yang disebabkan oleh perubahan medium pengimejan. Penapis hingar diaplikasikan
bagi menyingkirkan hingar data awan titik tanpa kehilangan banyak ciri penting.
Untuk menampung kesan pembiasan disebabkan oleh perumah penderia dan air,
kaedah pembetulan pembiasan berasaskan ray-casting yang cepat, tepat dan intuitif
telah dibangunkan yang digunakan untuk menunjuk awan semasa penjanaan mesh
3D. Pelaksanaan perisian termasuk penyaring hingar, penentukuran kamera, masa
pergerakan isyarat dan algoritma pembetulan pembiasan telah dikembangkan
menyesuaikan Kinect Fusion SDK untuk pemprosesan data bawah air. Eksperimen
pemerolehan data dilakukan dalam persekitaran terkawal dengan air yang jelas dan
keruh dan min ralat ± 6 mm dengan sisihan standard purata 3mm dicapai. Satu dataset
lengkap yang terdiri daripada imbasan 3D bawah air objek telah dibangunkan dan
dikeluarkan secara terbuka. Kawasan pemetaan terumbu karang dan pemetaan bawah
air dan pemetaan untuk penyelesaian robotik di perairan cetek boleh mendapat
manfaat daripada hasil yang dicapai oleh penyelidikan ini.
ix
In compliance with the terms of the Copyright Act 1987 and the IP Policy of the university, the copyright of this thesis has been reassigned by the author to the legal entity of the university,
Institute of Technology PETRONAS Sdn Bhd.
Due acknowledgement shall always be made of the use of any material contained in, or derived from, this thesis.
© Atif Anwer, 2017
Institute of Technology PETRONAS Sdn Bhd
All rights reserved.
x
TABLE OF CONTENT
ABSTRACT ........................................................................................................... vii
ABSTRAK ............................................................................................................ viii
TABLE OF CONTENT ............................................................................................. x
LIST OF FIGURES ............................................................................................... xiii
LIST OF TABLES ............................................................................................... xvii
LIST OF ABBREVIATIONS .............................................................................. xviii
LIST OF SYMBOLS .............................................................................................. xix
INTRODUCTION .......................................................................... 1
1.1 Background .................................................................................................. 1
1.2 3D Scanning and Scene Reconstruction for Underwater Applications ......... 4
1.3 Problem Statement ........................................................................................ 6
1.4 Hypothesis .................................................................................................... 7
1.5 Motivation .................................................................................................... 8
1.6 Research Objectives, Impact and Contributions ........................................... 8
1.7 Scope of work ............................................................................................... 9
1.8 Thesis Organization ...................................................................................... 9
DEPTH SENSING AND 3D SCENE RECONSTRUCTION IN
UNDERWATER ENVIRONMENT ........................................................................ 11
2.1 Overview .................................................................................................... 11
2.2 Overview of Depth Sensing Techniques ..................................................... 11
2.3 Light in Underwater Environment .............................................................. 14
2.3.1 Attenuation, Absorption and Scattering of Light in Water ................ 15
2.3.2 Refractive Index and Its Adverse Effects on Underwater Imaging ... 18
2.3.3 Effect of Water Salinity and Temperature on Light Transmission .... 21
2.4 Optical Depth Imaging and 3D Reconstruction in Underwater ................... 21
2.4.1 Structured Light Cameras ................................................................. 24
2.4.2 Time of Flight Depth Sensors ........................................................... 25
2.4.3 RGB-D Cameras ............................................................................... 26
2.4.3.1 RGB-D Cameras in Underwater Environment ...................... 29
2.5 Detailed Overview of Kinect RGB-D Sensors ............................................ 30
2.5.1 Kinect for Xbox 360 (KinectSL) ........................................................ 31
xi
2.5.2 Kinect for Xbox One (KinectToF) ...................................................... 32
2.5.3 Comparison of Kinect Devices ......................................................... 34
2.5.4 3D Scene Reconstruction Using Kinect Fusion ................................ 39
2.5.4.1 Brief Working of Kinect Fusion ........................................... 39
2.5.4.2 Tracking Performance and Reconstruction Volume ............. 41
2.5.5 3D Reconstruction Algorithms for RGB-D sensors .......................... 43
2.6 Proposed Methodology .............................................................................. 46
2.7 Summary .................................................................................................... 48
WATERPROOF CASING DESIGN AND DATA PROCESSING
PIPELINE FOR UNDERWATER 3D DATA ......................................................... 49
3.1 Overview .................................................................................................... 49
3.2 Design and Prototyping of Waterproof Housing ........................................ 49
3.2.1 Transparent Material Selection ......................................................... 50
3.2.2 Casing Structural and Sealing Design .............................................. 53
3.2.3 3D Printing Considerations .............................................................. 56
3.2.4 Structural Analysis of Designed Housing ......................................... 57
3.3 Refraction Correction and distortion removal in Underwater 3D Data ....... 60
3.3.1 Kinect RGB and Depth Camera Underwater Calibration ................. 60
3.3.1.1 Camera Calibration Concepts ............................................... 60
3.3.1.2 Underwater Calibration of Kinect Cameras.......................... 65
3.3.2 Time of Flight Correction in Underwater Environment .................... 66
3.3.3 Distortion Removal for ToF Camera in Underwater Medium .......... 69
3.3.3.1 Refraction Correction of Depth Data in Underwater ............ 69
3.3.3.2 Pincushion Distortion Removal in Depth Images ................. 76
3.3.4 3D Point Cloud Noise Filtering in Turbid Medium .......................... 77
3.3.5 Customized Kinect Fusion Implementation ...................................... 79
3.3.6 Qualitative & Quantitative Performance Criteria for 3D Meshes ..... 83
3.4 Experimental Setup .................................................................................... 85
3.4.1 KinectToF Underwater Dataset and Selection of Test Objects ........... 86
3.4.2 Uncorrected RGB, IR and Depth Images from Submerged Kinect ... 90
3.4.3 KinectToF RGB and IR Camera Calibration ...................................... 91
3.4.4 Real-Time Data Collection, Scanning Rate and Parameters ............. 93
3.5 Summary .................................................................................................... 94
xii
RESULTS AND DISCUSSION ................................................... 95
4.1 Overview .................................................................................................... 95
4.2 Performance of KinectToF Sensor in Underwater Environment ................... 95
4.2.1 Kinect Depth Camera Performance in Underwater Environment ...... 96
4.2.2 Camera Calibration and Distortion Correction Results ..................... 98
4.2.3 Effect of Colour and Material of Scanned Objects ............................ 99
4.3 Qualitative and Quantative Performance Evaluation ................................ 101
4.3.1 3D Reconstruction in Water by Unfiltered Kinect Fusion ............... 103
4.3.2 3D Reconstruction Results after Camera Calibration ...................... 104
4.3.3 3D Reconstruction Results after Median Filtering .......................... 105
4.3.4 3D Reconstruction Results after ToF and Refraction Corrections ... 106
4.4 Comparison with Existing Methods .......................................................... 113
4.5 Summary .................................................................................................. 114
CONCLUSION AND FUTURE WORK .................................... 116
5.1 Contributions ............................................................................................ 117
5.2 Limitations and Future Work .................................................................... 117
5.3 List of Publications ................................................................................... 119
BIBLIOGRAPHY ................................................................................................. 128
APPENDICES
A. Housing Design Drawings
B. Brief Description and Working of Kinect Fusion Interface
xiii
LIST OF FIGURES
Figure 1.1: Areas in which underwater 3D imaging is being used extensively such
as sub-sea surveys [3], coral reef preservation [4] and maintenance
[5], underwater robotics [6] etc. ............................................................ 2
Figure 1.2: Autonomous [6] and semi-autonomous [9] robotic exploration .............. 3
Figure 2.1: Taxonomy of depth measurement methods (expanded version of the
one proposed by Lachat et al. [15]) ..................................................... 13
Figure 2.2: Spectral distribution of the electromagnetic spectrum .......................... 14
Figure 2.3: Absorption coefficient for light wavelengths in water at 20° C [17] ..... 16
Figure 2.4: Image taken of spectral (right) and fluorescent (left) colour paint
samples taken (a) outside water (b) underwater at a depth of 60 m ..... 18
Figure 2.5: Refractive index of water variation with temperature [16] ................... 20
Figure 2.6: Popular techniques of active optical 3D imaging .................................. 22
Figure 2.7: 3D scene reconstruction process overview ........................................... 23
Figure 2.8: KinectSL and its internal structure [52] .................................................. 32
Figure 2.9: KinectToF and its internal structure [54] ................................................ 33
Figure 2.10: The 3D image sensor system of KinectToF [55] ................................... 33
Figure 2.11: Kinect measured depth vs actual distance ........................................... 35
Figure 2.12: Kinect Fusion overall workflow as given by Newcombe et al [65] ..... 40
Figure 2.13: ICP for aligning point clouds acquired by Kinect ............................... 40
Figure 2.14: A cubic volume is subdivided into a set of voxels which are equal in
size and defined per axis. [66]. ............................................................ 43
Figure 2.15: Workflow for generating real-time 3D meshes from Kinect sensor,
in under water environment. Coloured blocks are the contributions
of this research. ................................................................................... 47
Figure 3.1: Transmission percentage of different wavelengths of light through
3mm Acrylic [74]. The red band is NIR wavelength used by
KinectToF ............................................................................................. 51
Figure 3.2: 3D mesh reconstruction results in open air through various
thicknesses of Perspex (a) Original scene (b) No Perspex (c) 2mm
(d) 3mm (e) 5mm (f) 8mm .................................................................. 53
xiv
Figure 3.3: Cable gland design (a) Exploded view (b) Cross-section view .............. 54
Figure 3.4: Designed housing assembly (a) housing only (b) housing with
KinectToF (c) exploded view of the assembly ....................................... 55
Figure 3.5: Zoomed in view of the porosity between fused 3D printed layers due
to FDM process .................................................................................... 56
Figure 3.6: (a) Increasing pressure exerted on a submerged object in water (b)
simulated linear relationship of pressure in water and depth ................ 57
Figure 3.7: Structural strength analysis of the designed KinectToF housing (a) Von
Mises stress distribution (b) displacement due to pressure (c) 1st
principle stress (d) 3rd principle stress (e) Safety factor results ............ 58
Figure 3.8: Structural strength analysis of the designed cable gland (a) Inside
edge Von Mises Stress distribution (b) Outside surface Von Mises
Stress distribution (c) Inside edge displacement due to pressure (d)
Outside surface Displacement due to pressure ..................................... 59
Figure 3.9: Camera calibration process.................................................................... 61
Figure 3.10: Types of distortion in an image [76] .................................................... 64
Figure 3.11: Pictures of (a) black and white calibration checkerboard and (b)
colour checkerboard taken underwater from the Kinect RGB camera.
............................................................................................................. 66
Figure 3.12: Calculating the corrected time of flight values .................................... 68
Figure 3.13: Simulated depth distance (mm): measured (red) vs actual (blue) ........ 68
Figure 3.14: Formation of virtual image at due to refraction ................................... 70
Figure 3.15: Calculating refraction of a ray of light for two materials resulting in
a shift in perceived depth ..................................................................... 71
Figure 3.16: Spherical to image coordinate conversion ........................................... 73
Figure 3.17: Projections of a depth point on the Kinect sensor image plane ............ 74
Figure 3.18: Methodology to trace the light ray path for each depth pixel .............. 74
Figure 3.19: (a) Bottom plane (blue) is the simulated curved point cloud whereas
the top (yellow) is the refraction corrected point cloud (b) A plot of
the calculated error distance that grows larger error as the distance
from central axis increases. .................................................................. 75
Figure 3.20: Front and left views of acquired noisy point cloud .............................. 78
xv
Figure 3.21: The main user interface and sub-windows .......................................... 80
Figure 3.22: Kinect Fusion implementation flowchart (Kinect Fusion SDK
function names are written in red) (Page 1) ......................................... 81
Figure 3.23: Kinect Fusion implementation flowchart (Kinect Fusion SDK
function names are written in red) (Page 2) ......................................... 82
Figure 3.24: Qualitative analysis comparison process ............................................. 83
Figure 3.25: (a) Target point cloud (green) and reference point cloud (yellow)
(b) finding the distances of the point clouds (c) error heat map ........... 85
Figure 3.26: Experimental setup for data acquisition at swimming pool and
offshore experiment facility at UTP .................................................... 86
Figure 3.27: Raw images captured from KinectToF cameras under water (a) RGB
(b) depth (c) infrared ........................................................................... 91
Figure 3.28: Infrared camera calibration underwater (a) original images (b)
enhanced IR images (c) calibration images used to calculate the
parameters ........................................................................................... 92
Figure 4.1: Reported vs actual depth of KinectToF in underwater environment ....... 96
Figure 4.2: Original depth data reported by KinectToF ............................................. 97
Figure 4.3: RGB camera calibration results in air and under water (a) focal length
and principal axis values (b) distortion coefficients ............................ 98
Figure 4.4: IR camera calibration results in air and under water (a) focal length
and principal axis values (b) distortion coefficients ............................ 99
Figure 4.5: IR image (a) original (b) undistorted using calibration parameter ...... 100
Figure 4.6: Dense vs sparse point cloud under water ............................................ 101
Figure 4.7: Steps for 3D mesh generation in underwater environment .................. 102
Figure 4.8: (a) RGB image of scene (b) 3D reconstruction by Kinect Fusion only
.......................................................................................................... 103
Figure 4.9: (a) 3D reconstruction by Kinect Fusion only (b) Mesh after applying
camera calibration ............................................................................. 104
Figure 4.10: Noise filtering results (a) results after camera calibration (b) mesh
after noise filtering ............................................................................ 105
Figure 4.11: ToF and refraction correction results (a) mesh with median filter (b)
mesh after applying ToF and refraction correction ............................ 106
xvi
Figure 4.12: Alignment error maps of 3D reconstructed mesh of a submerged
swimming pool wall compared with an ideal plane, showing the
refraction correction results. green represents 0 mm error, red
represents ≥ +20 mm error, blue represents ≥ -20mm error. (From
Left to right: Ideal reference plane, original Kinect Fusion mesh,
after camera calibration, after median filtering, ToF and refraction
corrected mesh) .................................................................................. 108
Figure 4.13: Results of RGB mapping on the generated 3D mesh (a) RGB image
acquired (b) 3D reconstructed scene (c) colour mapped mesh ........... 109
xvii
LIST OF TABLES
Table 1.1: Comparison of popular underwater 3D depth sensing techniques ............ 5
Table 2.1: Comparison of popular RGB-D sensors ................................................. 28
Table 2.2: Summary of previous work on RGB-D sensors under water .................. 30
Table 2.3: Specification comparison of KinectSL and KinectToF .............................. 36
Table 2.4: Previous work done on characterizing KinectToF properties ................... 37
Table 2.5: Summary of related work done on scene reconstruction and mapping ... 44
Table 3.1: Stress analysis simulation results summary ............................................ 60
Table 3.2: Measured vs actual distance measured by Kinect under water ............... 69
Table 3.3: Visual parameters for qualitative analysis .............................................. 83
Table 3.4: Objects selected for scanning and their characteristics ........................... 89
Table 4.1: RGB camera calibration results in air and underwater ........................... 98
Table 4.2: IR camera calibration results in air and underwater ............................... 99
Table 4.3: Effect of colour and material of objects on underwater NIR scanning . 100
Table 4.4: Front/side view of 3D reconstructed submerged swimming pool wall . 107
Table 4.5: Additional object scan results in different conditions........................... 110
Table 4.6: Error heat maps and gaussian distribution of error histogram of various
objects scanned underwater. Objects scanned are compared to original
3D CAD model as well as with the 3D printed model scanned with
KinectToF in the air ............................................................................... 111
Table 4.7: Summary of comparison with similar work ......................................... 113
xviii
LIST OF ABBREVIATIONS
AUV Autonomous Underwater Vehicle
ROV Remotely Operated Vehicles
LIDARs Light Detection and Ranging sensors
SL Structured Light
ToF Time of Flight
KinectSL Kinect v1 (Kinect for Xbox 360)
KinectToF Kinect v2 (Kinect for Xbox One)
IR Infrared
NIR Near InfraRed
VPM Voxels Per Meter
IP Ingress Protection
FOV Field of view
ABS Acrylonitrile Butadiene Styrene
PLA Poly Lactic Acid
FDM Fused Deposition Modelling
ICP Iterative Closes Point
COTS Commercial off-the-shelf
FPS Frames per second
xix
LIST OF SYMBOLS
I Irradiance
k Attenuation coefficient
a(λ) Absorption coefficient
b(λ) Scattering coefficient
λ Wavelength
f Focal length
c Speed of light
ν Velocity
η Index of refraction
θ, ϕ Angles
t Time
r Radial distance
d Depth distance
fmod Frequency of modulation
px Pixel
vpm Voxel per meter
Γ Transmittance
P Pressure
atm Atmospheric pressure
K Intrinsic matrix
R Rotation matrix
T Translation matrix
x Uncorrected coordinates on projection plane in x-axis
y Uncorrected coordinates on projection plane in y-axis
z Scale factor
X Coordinates in real world along x-axis
Y Coordinates in real world along y-axis
Z Coordinates in real world along z-axis
q Principal point
γ Skew coefficient
xx
m Pixel coordinates in x-axis
n Pixel coordinates in y-axis
x’ Undistorted coordinates on projection plane in x-axis
y’ Undistorted coordinates on projection plane in y-axis
p Tangential distortion coefficients
α, β, χ Radial distortion coefficients of lens
ζ Linear scaling of image
shiftd Shift in depth
σ Standard deviation
1
INTRODUCTION
1.1 Background
Approximately 70% of the earth surface is submerged under water and hence is of
immense interest to researchers, scientists and engineers for identifying, exploring
and understanding the planet earth and its diverse ecosystems. Unfortunately, being
submerged in immense depths most of which are yet to be fully explored, access to
these rich ecosystems poses an enormous challenge, inhibiting detailed exploration
and understanding. Efforts have been done since time immemorial to map the
uncharted seas and explore the treasures hidden beneath the vast ocean surface. With
the advent of revolutionary maritime technologies such as diving gear and
submarines, particularly in the 14th and 15th century by Leonardo Da-Vinci, access to
sub-sea environment was pioneered, stirring a renewed interest in understanding
surfaces beneath lakes and oceans and its wonders. Cartographers developed sub-sea
topographic mapping techniques known as bathymetry, to map the geological and
geographical features. Maritime maps of busy harbours and sea shores were
imperative for the development and extension of widespread maritime navigation and
trade activities between nations around the world.
As naval technologies for navigation and mapping improved, it resulted in
vastness and accuracy of these bathymetric maps. Discovery of vibrant ecosystems of
sub-sea life such as coral reefs, particularly in the late 17th century, led to detailed
mapping to study these wonders of nature. Coral reefs are fragile ecosystems, partly
because they are susceptible to water temperature and are under threat from climate
change, oceanic acidification, blast fishing, overuse of reef resources and harmful
land-use practices, including urban and agricultural runoff and water pollution, which
2
can harm reefs by encouraging excess algal growth. Preservation of these wonders of
nature requires the creation of accurate maps of reefs and surrounding areas [1], so
the lost or damaged reefs can be re-grown in their original grandeur and beauty.
With the growth of human civilization, maritime activities in deep seas required
greater understanding of the sub-sea bed rock for safe and efficient sea travel, mostly
to promote trade between nations. As these activities continued to increase, the
discovery of submerged ancient archaeological sites of civilizations of past, as well
as discovery of previously undocumented shipwrecks took the scientific and historian
community by storm. Understanding and recording these discoveries gave precious
insight of past civilizations and the evolution of human civilization itself. With the
advent of electronics devices and sensors, a significant rise has been seen in the speed
and accuracy of underwater maps. As summarized in figure 1.1, discovery and
exploration of geological structures such as hydrothermal vents, coral reefs, along
with study of oil pipeline inspections [2], offshore structure maintenance, shipwrecks
and a relatively newer phenomenon of aircraft crash exploration has benefited
extensively with modern imaging and mapping technologies.
Figure 1.1: Areas in which underwater 3D imaging is being used extensively such as
sub-sea surveys [3], coral reef preservation [4] and maintenance [5], underwater
robotics [6] etc.
3
Standard bathymetric maps are also being augmented into 3D visualizations as
the technology for 3D imaging started to rise to a commonplace level. Since the late
20th century, Remotely Operated Vehicles (ROV’s) as well as Autonomous
Underwater Vehicles (AUV’s) are being used extensively used in underwater
exploration missions, like the ones shown in figure 1.2. These vehicles carry a host
of sensors as well as multiple imaging devices to provide real-time as well as recorded
visual feedback to the operators. This data is often of critical importance for various
applications such as health monitoring and inspection of underwater man-made
structures such as oil-rigs, as well as for maintaining an updated record of
archaeological sites and health monitoring of coral reefs. Enabling data acquisition
especially for visual servoing [7] in real time is extremely important for agile and
autonomous navigation of robots as discussed by [2] and [8] for unknown and
unstructured environments. 3D mapping data in real time provides active perception
that is required for path-planning, localisation as well as control purposes, especially
in the absence of or for augmenting inertial navigation.
Figure 1.2: Autonomous [6] and semi-autonomous [9] robotic exploration
3D scanning and reconstruction is also being used extensively for ship hull
inspection, mapping the coral reefs, archiving archaeological sites [10], preserving
and analysing sunken ships [11] and war-planes, etc. Historical preservation is one of
the key attributes of mankind that enables a permanent record for the generations to
come so that they can learn from their past. Preservation or history and ecosystems
4
like the coral reefs is a big responsibility for each generation of scientists and
engineers. 3D data also provides useful insight on various factors of the unobservable
environment, aiding in sometimes critical decisions as well as aiding in preventive
maintenance before any critical failure occurs.
1.2 3D Scanning and Scene Reconstruction for Underwater Applications
3D scanning and scene reconstruction is a technique to digitize real world objects and
surfaces into 3-dimensional graphical models or meshes. These 3D models can then
be used for various applications, ranging from generating prototyping, record
preservation, maintenance and inspection or for the use in modern entertainment
applications such as Augmented Reality (AR) or Virtual Reality (VR). For
engineering and industrial applications, it enables qualitative and quantitative analysis
of the objects by comparison to the original design intent and verify the product; post-
production or after passage of certain time of usage in its real environment. The latter
is especially true in harsher environments such as underwater, where changes such as
erosion, rusting, and other deteriorating effects due to weather and surrounding
conditions are common. The acquisition of the geometric description of a dynamic
scene has always not only been a very challenging task but it is a compulsory
requirement for robotics, where the robot must know the description of the
environment in order to actively and safely perform its duties, especially in tandem
to human operators and co-workers.
3D scanning or scene reconstruction can be done by either contact or non-contact
data collection techniques. Contact based techniques such as a Coordinate Measuring
Machine (CMM) use physical contact with the object for precise measurement,
whereas non-contact scanning techniques use some form of active scanning with
sonars, ultrasound, x-ray or optical imaging in different wavelengths of light. As
contact sensing requires the object to be approachable and preferably isolated, it
cannot be used as a flexible solution for in-place measurement of objects. Non-contact
methods provide the benefit of scanning the objects in-place; in their original
5
environment. This is much more beneficial for inspection, maintenance and
preservation activities where moving the object is not a feasible solution.
Nowadays, 3D scanning is being utilized much more regularly due to significant
advancement of 3D depth sensing techniques such as stereo imaging, depth sensors
such as sonar’s, Light Detection and Ranging sensors (LIDARs) and commercial
depth cameras. For underwater applications, 3D scanning and volumetric
reconstructions from non-contact sensors are being used extensively for ship hull
inspection [12], mapping the coral reefs, scanning underwater terrain, sunken ships
and war-planes just to name a few. Except for camera based stereo imaging, the
specialized more expensive marine hardened solutions provide long range 3D scene
reconstruction using offline processing of prior collected data. RGB cameras (in
monocular or stereo or multiple configurations) offer methods for real time 3D
reconstruction in underwater environment, but are heavily dependent on presence of
ambient or artificial lighting and have limited range due to the properties of light
propagation in water. To acquire detailed underwater maps of areas of interest, a
detailed 3D scene is reconstructed generally using stationary equipment, divers and
autonomous or semi-autonomous robotic vehicles (AUV’s). However, underwater
environment itself offers a unique challenge for mapping and 3D scene construction
of both geological or man-made underwater structures. A brief comparison of the
various types of depth sensors used for underwater 3D scanning is given in table 1.1.
Table 1.1: Comparison of popular underwater 3D depth sensing techniques
Property LIDAR Sonar RGB Imaging Range >40m >30m Typically, a few meters
(depends on turbidity of water) Spatial resolution High Medium Medium
Effect of temperature
Not affected
Greatly affected
Nil
Ambient light Not affected
Not affected Highly dependent
6
3D scanning requires state-of-the-art sensing and instrumentation technologies
that were only available to research labs or major conglomerates until recently. With
the release of commercial Depth sensors in the mid-late 2000’s changed this
drastically. Low-cost depth sensors based on Structured Light (SL) and Time of Flight
(ToF) technologies emerged in the hobbyist and commercial market. These sensors
were much cheaper than the traditional specialized and accurate industrial sensors
which allowed them to be used by hobbyists and roboticist for 3D mapping and depth
sensing. Companies such as Microsoft, Asus and Intel released depth sensors such as
the Kinect™ 360 (2010), Xtion™ (2011), Kinect™ v2 (2014) and RealSense™
(2014), primarily as motion capture solutions. The lower cost of sub 200 US$ and off
the shelf availability of these sensors led to a sweeping increase in the robotics
community. Together with the high scanning resolution and open source software
libraries, these sensors provide real-time, small scale and efficient 3D scanning
sensors that can be used for all sorts of research and commercial purposes alike.
However very little research has been done in utilization of these sensors for
underwater applications, as discussed in the following sections.
1.3 Problem Statement
Over the last three to four decades, terrestrial applications such as robotics and
autonomous vehicles have seen extensive research and growth in 3D mapping and
scene reconstruction techniques. Issues such as noise reduction, 3D camera
calibration and environmental effects for outdoor environments are being researched
extensively. 3D reconstruction in underwater environment however; is a much more
challenging task due to the requirement of expensive, specialized equipment and
services as well as the challenges faced due to the harsh environment and properties
of water as an imaging medium. The prohibitive cost of acquiring up-to-date data
through traditional methods, such as airborne LiDARS, advanced ship-based Sonars
or static 3D scanning sensors and services such as 2G Robotics [5] and 3D at Depth™
[13] limit the work of many researchers and organizations with limited budgets. The
development of an economical sensor that can give the same or better level of
performance for small scale researchers is still an area open for research.
7
According to the literature reviewed, there is a significant gap in the availability
of a cost-effective, real time, underwater 3D scene reconstruction sensors and
techniques. Moreover, the exiting methods are not ideally suited for finely detailed
reconstructions of underwater scenes. By using a commercial RGB-D sensor, small
scale research activities on real-time scene reconstruction in underwater environments
can benefit greatly with the reduced cost. Up till now, very limited research has been
done in testing RGB-D sensors in real underwater environment. Also, the work done
by Digumarti et al [14] is the only available refraction correction technique for
adapting a RGB-D (structured light) sensors in underwater environment. But the
proposed refraction model is processor intensive and limits the real-time generation
of 3D scenes.
1.4 Hypothesis
As detailed in section 1.3, there is a gap in the availability of economical and small-
scale sensors for underwater applications. Nowadays with the advent of commercial
RGB-D sensors, the possibility of fast, accurate and low-cost 3D sensing has become
available. However, even after almost a decade after the launch of these sensors, very
little work has been done for using them in underwater 3D scanning application.
Sensors like Kinect v1 (denoted as KinectSL from here onwards) and Kinect v2
(denoted as KinectToF from here onwards) provide the ability for real-time 3D scene
reconstruction, which can then be used for a multitude of purposes ranging from
robotic navigation to simple scene reconstruction for visualizing and understanding
the condition of the surrounding environment. It is theorized that the development
and testing of low-cost and real-time techniques using low-cost commercial depth
sensor can be done using the Microsoft® KinectToF. It is a time of flight depth sensor
costing approximately 120US$; primarily designed to be a used as a Natural User
Interface (NUI) device for the gaming console Xbox™ One. It is expected to provide
a small scale, low cost and low range 3D scene reconstructing solution with high
spatial resolution for detailed scene visualization. These sensors can easily be
mounted on ROV’s offering a great opportunity for 3D navigation and scanning at
deeper depths of water. As this is still an emerging field, with a significant potential
8
for further research and the work done on this is very limited thereby providing
additional motivation to explore this area for underwater applications.
1.5 Motivation
The core motivation behind any engineering problem is the development of better and
cost effective technology and techniques, that can add to the previous technological
advancement and have a positive influence and forward progression towards the
overall state of the art. Keeping in view this primary cause; the motivation behind this
research work is exploring and improving the current state of underwater 3D sensing
technology and processes for environmental and robotics applications; while having
a constructive impact on underwater monitoring and preservation of natural and man-
made structures. To achieve this, this research work is based on the use of low-cost,
commercial off-the-shelf (COTS) sensors such as the Microsoft Kinect v2 (KinectToF)
that could bring cost effective, robust and real-time 3D underwater sensing and
mapping to the access of budget limited research activities.
1.6 Research Objectives, Impact and Contributions
The objectives of this research work are enumerated as follows:
To investigate the performance of near infrared, time of flight, KinectToF
sensor in underwater environment.
To develop, implement and characterize a real-time solution for 3D
reconstruction of underwater scene.
Since insufficient work has been done in the area of commercial depth sensors in
underwater environment, as discussed in detail in section 0, there remains a significant
technology gap and margin for research and exploration. The major contribution of
this thesis is the adaptation of an economical depth sensor for underwater
environment, without any hardware modifications. The methodology proposed in this
thesis enables real-time 3D scene reconstruction using an un-modified KinectToF. The
9
undesirable and adverse effects of using an imaging device in underwater such as
distortion, refraction, effects due to the housing and noise are catered for. An intuitive
and computationally efficient methodology of refraction correction inspired from the
standard ray-tracing techniques used in computer graphics is proposed, keeping in
mind the high-performance requirements of real time reconstruction.
1.7 Scope of work
The scope of this work, as defined by the research objectives, is limited to
investigation of performance of KinectToF camera in underwater environment and
develop methods and algorithms to cater for the negative effects encountered while
data acquisition. The scope of this research does not include 3D object and plane
segmentation from the reconstructed mesh or any object recognition approach to
identify the object being reconstructed from the surroundings. These additional
objectives have been identified to be part of the future work that can be done to extend
the research for real world application scenarios.
1.8 Thesis Organization
This thesis is organized into 6 Chapters. In this chapter, we have established that
underwater 3D imaging is significant for various commercial and scientific purposes.
This thesis attempts to add to the current state of the art of 3D imaging in underwater
environment by adapting a well know commercial depth sensor and real-time 3D
scanning algorithms that are widely used in normal open-air environments, to work
with nearly same accuracy in the underwater environments. The remainder of the
thesis covers various aspects of this research and is arranged as follows.
Chapter 2 covers the current state of the art in 3D underwater imaging as part of
the literature review. The chapter covers an introduction on the properties of different
wavelengths of light in water and various effects such as refraction, absorption and
scattering of light etcetera. Since this work is focused on the Microsoft Kinect, which
is a Near Infrared (NIR) device, the effects of water on infrared wavelength is covered
10
in detail. Traditional underwater 3D imaging and scene reconstruction techniques and
sensors are sparsely covered to establish the research gap and the contribution of this
thesis. The KinectToF sensor specifications and its properties are also explained in
detail.
The methodology employed for carrying out this research is defined in chapter 3,
which comprises of two main parts. The first part deals with the development of a
special housing that has been designed for water proofing without diminishing the
performance of the sensor. The complete hardware design intent and simulation
results are covered. The second part covers of chapter 3 covers the experiment design,
data acquisition setup and main contribution of this thesis including algorithms and
techniques developed for real time 3D scene reconstruction in underwater
environment. Issues faced such as noise and refraction correction and their counters
have been discussed at length.
The results including qualitative and quantitative analysis have been discussed in
chapter 4. Comparison of the results achieved with relevant techniques and
comparison of aerial reconstruction versus underwater reconstruction are deliberated
in detail. Chapter 5 covers the conclusions and future work proposed based on the
results achieved. Materials such as a tutorial on the software developed for the
research work and the housing design details and specifications are given as
appendices at the end of this thesis.
11
DEPTH SENSING AND 3D SCENE RECONSTRUCTION IN UNDERWATER
ENVIRONMENT
2.1 Overview
This literature review section begins by introducing various depth sensing techniques
that are commonly used with emphasis on optical depth sensing in the underwater
environment. This is followed by details about characteristics of light and the effect
of water on the propagation of light in water and issues affecting light such as
attenuation, absorption, scattering, refraction, the effect of salinity and temperature
etcetera are then discussed. Details on the working principals of active and passive
sensors like RGB Cameras and active optical sensors such as structured light and time
of flight sensors are discussed. This is followed by technical and working details of
both Kinect sensors with focus on KinectToF which has been utilized in this research.
Descriptions on how Kinect acquires and generates the depth image and its various
benefits and issues are deliberated in detail.
2.2 Overview of Depth Sensing Techniques
Depth sensing techniques can be classified in two distinct methods, contact and non-
contact. Non-contact methods have the advantage of acquiring depth data for 3D
scanning of the objects in its original environment, without the need for interfering
with the working conditions. Especially for underwater, since the access to objects
and surfaces deep in water is constrained, non-contact methods are the preferred
methods. Non-contact sensing methods are mostly based on reflective and
transmissive techniques. Reflective methods provide much more ease of use and have
12
been the centre of research for several decades. The reflective methods can either be
optical or non-optical. Non-optical methods cover methods and specialized sensors
such as radars, sonars etcetera. Most of the non-optical methods are inspired by the
natural echolocation techniques used by mammals such as bats and dolphins. Optical
methods comprise of one or more imaging sensors that work by capturing light in the
scene. These methods generally work on the visible spectrum of light. However
several techniques work in the infrared or ultraviolet domain as well. The source of
light divides the optical sensing methods into either active or passive nature. For
passive optical sensing, the light source is the ambient light that comes from the scene.
Light can be generated from any source, natural or artificial. For active optical
sensing, the light source is controlled as part of the sensing system, which can be
either comprise of the visible spectrum of light, or can be in the infrared or ultraviolet
wavelengths. The light can also be modulated, follow a specific pattern that is
detected by the sensors or be omnidirectional (uniform spatial distribution).
Optical sensing methods and depth imaging have several strengths alongside some
limitations. On one hand, the main strengths are that the method is non-contact, and
can be used for objects and surfaces from a distance. On the other hand, the same
strength becomes its limiting factor. Only visible portions of the surface or object can
be measured and occlusion is a limiting factor in depth measurement. Also, optical
methods are sensitive to the properties of interaction of light with the surface of the
target object as well as the intermediate medium between the sensing device to the
inspected object; therefore, common features such as transparency, reflectance or
absorbance of light (single or multiple wavelengths) are a major concern when doing
distance measurements. A taxonomy of depth measurement is given in figure 2.1,
which is an expanded version of the one provided by Lachat et al. [15]. The coloured
boxes leading to the KinectToF are the focus of this work.
13
Figure 2.1: Taxonomy of depth measurement methods (expanded version of the one
proposed by Lachat et al. [15])
For underwater environment, optical sensing methods face additional issues due
to the properties of light in water as a transmission medium. Light properties vary
according to the properties of the transmission medium. Water, being a denser
medium than air, has a distinct effect on the transmission of light. Accordingly, optical
sensors under water behave differently than in open air. Therefore, the imaging sensor
itself or the acquired image have to be modified to accommodate these effects. The
behaviour of light in water is discussed in detail in the following sections. As the
focus of this thesis is on active optical depth sensing methods, details of the working
Non-Contact
Reflective Transmissive
Optical Non-Optical
Radars, Sonars ...ActivePassive
Monocular / Stereo Imaging
Structure from Motion (SFM)
Shape from Silhouette
...
Structured LightTime Of Flight
KinectToF
KinectSL
Intel RealSense
LIDAR’s
Depth / Distance Measurement
Contact
...
14
principle of active optical sensors is discussed in detail with emphasis on ToF as it is
the working method used by KinectToF.
2.3 Light in Underwater Environment
Light has a dual nature of an electromagnetic wave and particle (photons). Therefore,
the effect on light in a medium are the resultant of both electromagnetic and
particulate nature. The electromagnetic spectrum of light spans an extensive range of
wavelengths comprising of both visible and invisible parts. Visible light falls between
the wavelength of 400nm to 700nm, with colours starting from violet at 400nm and
going towards red at 700nm, with each intermediate colour having its specific band
of wavelength in the spectrum, as shown in figure 2.2. The wavelengths before 450nm
are ultraviolet radiations whereas the wavelengths of light larger than 750nm up to
1000μm line in the Infrared (IR) region. The infrared region is broadly subdivided
into Near Infrared Region (NIR) spanning from 750nm to 1400nm and Far Infrared
(FIR) from 1500 to 1000μm. This exact subdivision varies with standards and uses;
however, the above-defined division is the most commonly used one.
Figure 2.2: Spectral distribution of the electromagnetic spectrum
Underwater environment affects light in multiple ways, with the most prominent
being the absorption of light as we go deeper in water. The rate of absorption of each
15
colour however is different. The result of this different rate of absorption is the visible
change of colours of objects submerged in water. Colours such as red, orange and
yellow appear to be overcome by a strong hue of green and blue. Other colours also
show the same effect, with the effect getting more pronounced as we go deeper in the
water. This ultimately leads to the loss of visibility to the human eye as the entire
visible light wavelength is absorbed in water. This absorption also has an impact on
underwater imaging sensors, which are discussed in detail in the ensuing sections.
2.3.1 Attenuation, Absorption and Scattering of Light in Water
The way human vision perceives colour is detecting the wavelength of light bouncing
off an object or through a medium. An object appears to be of a particular colour
whose wavelength is reflected off the objects’ surface. Object absorbs the remainder
of the wavelengths of the visible spectrum of light. Water has the inherent property
of attenuating the entire visible and invisible electromagnetic spectrum, with a
different rate of absorption for various wavelengths of light. The wavelengths of light
that have less attenuation can penetrate deeper in water. Attenuation is defined as the
reduction in intensity of the light beam with respect to distance travelled through a
medium. Mathematically, attenuation of a wavelength of light of strength Io at a
particular depth is given by [16] as given in eqn. (1.1):
kzz oI I e (1.1)
where:
1
Depth
Irradiance at depth z
Irradiance at surface (depth = 0)
Attenuation coefficient ( )
z
o
z
I
I
k m
The attenuation of light is therefore exponentially increased with depth of water,
thus leading to complete absorption at a very short depth of water. The attenuation
coefficient is further comprised of two coefficients as given in eqn. (1.2):
16
( ) ( )k a b (1.2)
where:
a(λ) = absorption coefficient
b(λ) = scattering coefficient
Figure 2.3: Absorption coefficient for light wavelengths in water at 20° C [17]
So, the attenuation or loss of light in water is governed by the combined effects
of scattering and absorption and increases exponentially over the length of travel in a
medium. The absorption and scattering effects of light in water can be broadly
assumed to be due to the energy absorbing molecular structure of water and effect of
non-visible particles in water respectively. The absorption coefficient a(λ), is a
measure of the conversion of radiant energy to heat and chemical energy. It is
numerically equal to the fraction of energy absorbed from a light beam per unit of
distance travelled in an absorbing medium [18]. The absorption coefficient of light in
the visible and infrared spectrum is given in figure 2.3. The scattering coefficient b(λ),
is equal to the fraction of energy dispersed from a light beam per unit of distance
0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
100 300 500 700 900 1100
AbsorptionCoefficient(cm
¯¹)
Wavelengthλ(nm)
NearInfrared700‐ 1400nm
VisibleSpectrum400‐ 700nm
17
travelled in a scattering medium. Light scattering changes the direction of photon
transport, “dispersing” them as they penetrate a sample, without altering their
wavelength. For example, water with b(λ) of 1 cm-1 will scatter 63% of the energy
out of a light beam over a distance of 1cm, whereas another sample with b(λ) of
0.1 cm-1 will scatter the same proportion of energy in 10 cm. Both absorption and
scattering reduce the light energy in a beam as it travels through a sample. The
scattering coefficient of pure water is less than 0.003 cm-1 [18] so light scattering has
a smaller influence as compared to absorption.
This varying rate of absorption of light of different light wavelengths is the reason
why objects submerged in water bodies appear bluish-green. Several experiments
have been carried out by researchers to understand the behaviour of the visible
spectrum in underwater. The images in figure 2.4 were taken in the Gulf of Mexico
[19] at a depth of 60 feet with a visibility of approximately 60 m. Samples of different
fluorescent (left) and spectral (right) colours were used. As visible in the image, the
colours are completely attenuated and almost appear black. Orange colours turn
appear olive-green and green go lighter and appear closer to yellow. Blue and indigo
retain their original appearance whereas violet appears to be closer to black.
Alternately, for fluorescent colours, the change in appearance was not significantly
changed. These results show that fluorescent colours are less immune to colour
attenuated as compared to spectral colours underwater. For spectral colours, the blue
and green wavelengths have the lowers rate of absorption in water, whereas the rest
of light wavelengths are absorbed much faster. Thus, if we go below a few meters of
water, an objects original colour appears to be heavily infused with a mixture of
greenish, bluish tint. This makes objects lose their original appearance and make it
difficult for visual analysis in the visible spectrum range. Several techniques are
being explored by researchers such as the work being done by Khan et al. [20] and
[21] etc. that provide fast and adaptive methods to restore the original colours of the
images taken underwater.
18
Figure 2.4: Image taken of spectral (right) and fluorescent (left) colour paint
samples taken (a) outside water (b) underwater at a depth of 60 m
Just as in the visible wavelengths of the electromagnetic spectrum, water also
absorbs the Infrared (both near and far) and Ultraviolet (UV) wavelengths, shown in
figure 2.3 . The rate of absorption for NIR and UV wavelengths is quite higher than
the visible spectrum. Due to this high rate, maximum amount of infrared and
ultraviolet radiation is absorbed within the first few meters of water. Alternately, if
the IR or UV source is submerged, the distance that these light wavelengths can travel
is severely attenuated. Even if the source is sufficiently high powered, the distance
will still be much shorter than the distance of normal light generated by a submerged
light source. For these wavelengths to work underwater, selected wavelengths such
as 532 nm (green) and 440nm (blue) wavelength lasers are developed that can
penetrate much further in water. Lasers, being high powered as well as being coherent
beams of light have much-reduced attenuation than non-focused light.
2.3.2 Refractive Index and Its Adverse Effects on Underwater Imaging
The refractive index n of a material is a dimensionless number that describes how
much light is bent when entering or exiting a medium. Refraction is the bending of a
light ray when it enters a medium where its speed is different from the incident
medium. The light ray is bent or refracted toward the normal when it passes from a
less dense medium to a denser medium, at the boundary between the two media. The
19
amount of bending depends on the indices of refraction of the two media and is
described quantitatively by Snell's Law, as given in equation (1.3):
1 1 1 1
2 2 2 2
sin
sin
v
v
(1.3)
Snell's Law states that the ratio of the sines of the angles of incidence and
refraction is equivalent to the ratio of phase velocities in the two media, or equivalent
to the ratio of wavelengths of light in the two mediums, or is equal to the reciprocal
of ratio of the indices of refraction. So, a medium with refractive of index higher than
air (η=1), bends the light towards the normal and vice versa. The refractive index of
a medium like water is dependent on the temperature and wavelength of light. The
refractive index of water is dependent on temperature and wavelength is shown as in
figure 2.5. As is visible in the graph, there is very slight variation of refractive index
with respect to temperature. The change of refractive index becomes more noticeable
if the temperature varies approximately 25° to 30° from the source ambient
temperature. Therefore, for most cases, the refraction index of water can be
considered constant.
Refractive index has a very prominent impact on all imaging sensors used in
underwater environment. For standard pinhole cameras, light from the scene is
focused on a point using lens which works on the principle of refraction between the
surrounding medium and lens. When a camera is immersed in water, the difference
of speed of light and optical density between the surrounding medium and the lens is
different than it is working in open air. Due to this difference, the focal length of the
lens increases than its actual focal length and the scene seems to be farther than it is,
and the scene shows an apparent shift away from the camera.
20
Figure 2.5: Refractive index of water variation with temperature [16]
As light is comprised of multiple colour components, each with their specific
wavelengths and slight variation in speed. Then passing from water into camera lens,
each component of light is refracted at a different rate which results into splitting of
white light in various components. These different colours overlap at the focal point
causing a loss of sharpness and colour saturation. This is called Chromatic Aberration
[22].
One design consideration for underwater cameras is the selection of the shape of
the transparent housing in front of the camera lens and aperture. If the transparent
housing in front of the camera is a flat surface, the light rays are distorted unequally
as the light rays hitting perpendicularly are not refracted as opposed to the rays hitting
at increasing angles from the normal, which face increasing refraction. This results in
a progressive radial distortion that becomes more evident as wider lenses are used.
These distortions radially symmetric, and can be classified as either barrel distortions,
pincushion distortions or a complex combination of both. These distortions in the
acquired image also generate a progressive blur that increases with large apertures on
1.3
1.31
1.32
1.33
1.34
1.35
1.36
1.37
1.38
1.39
1.4
0 20 40 60 80 100
RefractiveIndexofWater
WaterTemperature(°C)
226.5nm 361.05nm 404.41nm
589nm 632.8nm 1.01389µm
21
wide lenses. For dome-shaped spherical housing that is symmetrical about the
principal axis of the camera, the light rays strike the housing perpendicularly from all
directions, significantly reducing the problems of refraction, radial distortion and
axial and chromatic aberrations. This is especially true if the spherical radius has its
centre at the focal length of the lens. However, the design of a customized spherical
housing is quite difficult.
2.3.3 Effect of Water Salinity and Temperature on Light Transmission
Temperature and salinity of water also affect light transmission. Light absorption
coefficient of water is dependent on temperature and concentration of ions. Correction
coefficients can be used to calculate differences in the water absorption coefficient
for a known difference in temperature and salinity. Light scattering by pure water,
seawater, and some salt solutions have been modelled for nearly all wavelengths of
the electromagnetic spectrum. However, as compared to absorption, scattering by
saline water is negligible for general use, especially in the infrared spectral region as
noted by Rottgers et. al. [23]. The effect of salinity and temperature for calculating
temperature and salinity correction coefficients are ±0.5% C-1 and ≤ -0.05 % gL-1
respectively. Therefore, these effects can be taken as negligible if the depth being
measured is small.
2.4 Optical Depth Imaging and 3D Reconstruction in Underwater
Traditionally, depth imaging in underwater is done using non-optical techniques such
as acoustic sonars that are mounted on ships, submarine or remotely operated
vehicles. For bathymetry and underwater surveys, popular imaging sonars are Side-
Scan Sonars or the newer Synthetic Aperture Sonars (SAS). Substantial work has
been done on 3D target reconstruction from side-scan sonars [24], however, because
of the particular properties of light in water and the presence of suspended particles,
sonar images are very noisy, the object boundaries are not uniform, and the contrast
is low [25]. Sonars can work at much larger distances than optical cameras. However,
for 3D imaging, the resolution is somewhat compromised with the distance also and
22
even the high-resolution sonar scans cannot capture the finer details in underwater
objects and surfaces. Significant research is being done to develop various augmented
sonar technologies such as Multi-beam sonars and Acoustic cameras (vision and sonar
in tandem) [26], to cater for these issues.
Figure 2.6: Popular techniques of active optical 3D imaging
In recent years, however, depth imaging for bathymetry as well as for underwater
3D reconstruction using optical sensors is being vastly adopted for various
applications[27]. Popular optical sensing methods that are being used in underwater
environment are stereo vision (passive optical imaging), time of flight imaging and
structured light imaging (active optical imaging) as given in figure 2.6, which are
discussed in detail in the following sections.
The depth data acquired after scanning from a passive or active optical sensors is
spatial and is represented by a point cloud. A point cloud is a set of data points in a
3D coordinate system with each point representing the distance to a point on the
external surface of an object that reflects a light ray; captured by the imaging sensor.
The data in point clouds are generated in very large numbers and the density of the
point cloud defines the resolution of the 3D image. Sparse point clouds have lesser
Object
Camera 2 Camera 1
Object
CameraLaser Source
Object
CameraProjector
Stereo Vision Structured Light Time of Flight
23
number of data points captured per unit area than dense point clouds. Density of the
point cloud is defined by the depth sensor resolution. The raw point cloud data itself
is not used directly and is converted in to a polygonal or triangle mesh model or Non-
Uniform Rational Basis Spline (NURBS) surface etc. through a process commonly
referred to as sparse or dense surface reconstruction. These surfaces or meshes can be
used for visualizing 3D models on the screen or CAD models that can be analysed,
printed using additive manufacturing or developed using any number of
manufacturing processes. The generated mesh is a single object, with no distinction
between objects and surfaces in the scene. The entire process can be summarized by
figure 2.7.
Figure 2.7: 3D scene reconstruction process overview
3D object detection and segmentation algorithms are then applied on the mesh to
distinguish between individual objects and items. For robotic motion and path
planning, plane detection algorithms are used to segment out floor and walls to
evaluate the safest and efficient routes to follow to the required destination.
The following sections will discuss the most relevant 3D optical depth sensing
methods within the scope of this research work. Each sensing method has its benefits
and pitfalls when being used underwater, which are discussed in detail.
Object ona surface
3D Scanning Point Cloud 3D Mesh ObjectSegmentation
24
2.4.1 Structured Light Cameras
The structured light approach is an active optical depth sensing technique. It is a
variation of stereo-vision, where instead of using two imaging devices together, a
single imaging camera and a projector are paired together. Sequence of known
patterns is sequentially projected onto an object, which gets deformed by geometric
shape of the object. A typical system uses an LCD projector or other stable light
source to illuminate the scene with a changing pattern of stripes or fixed-point pattern
across an object. A camera in an offset position from the projecting source captures
the frames and calculates the distortions in the light patterns. The resulting distortions
can be processed and used to form a point cloud and therefore into a dense point cloud
representation of the object surface. The patterns can be in visible light wavelengths
or in invisible wavelengths. White light or infrared light are the preferred light types
used in structured light sensors.
The advantage of a structured-light scanning system is that it is a speedy process
and the output point cloud is precise. Since the entire scene is illuminated and
captured in an image, the method allows for scanning the entire field of view at once,
and a point cloud is captured in a single snapshot. However, it is generally suited for
static scenes and objects. Furthermore, any ambient light that interferes with the
projected light causes the sensors to acquire false or no data at all. Any background
light can lead to over-saturation for long exposure times causing problems to the
systems in detecting the light pattern Reflective, specular, transparent or light
absorbing surfaces also pose difficulty if the wavelength of light being used passes
through or bounces off the scene’s surface in indeterminate directions. Several
researchers have explored structured light for underwater 3D imaging. Bruno et al.
[28] proposed a method of using structured light with stereo photogrammetry for 3D
reconstruction in underwater environment and showed promising results even in high
turbidity levels, using 25 distinct fringe patterns scanned form stereo cameras.
Sarafraz et al. [29] have recently proposed a new method to estimate the shape of
underwater objects along with the shape of the water surface when the projecting
source is outside of water but the imaging sensor is submerged in water.
25
2.4.2 Time of Flight Depth Sensors
Time of flight (ToF) depth sensors calculate the depth from the sensor to a point on
the surface by calculating the time it takes for a signal generated from a known source,
to return after bouncing off a subject. Since the light travels in the air at a constant
speed ( 82.99x10c ms-2), the distance d covered at time t is given by
d = ct. Therefore, a ray of light emitted at time t0 by the transmitter is then reflected
from a point on the scene surface, travelling back again for the distance d and at time
t it reaches the ToF sensor receiver. The receiver is ideally coincident with the
transmitter. Since at time t the path length covered by the ray of light is 2d, it is simply
divided by 2 to get the depth of the reflecting surface. Another method of
implementing a ToF sensor is transmitting a modulated signal and then calculating
the phase difference between the transmitted and received signals. The phase shift in
the signals is a function of the time difference between the time of the transmitted and
received signal. A detailed working of the phase difference methodology of ToF
sensors has been covered by Jaremo et al. [30].
Despite the conceptual simplicity of a ToF sensor, its implementation is a big
technological challenge. Multiple issues must be catered in a ToF sensor. Firstly,
since the calculation depends on reading a reflected signal from the scene, isolating
the exact signal that was sent from multiple reflections and ambient noise is a big
challenge. Secondly, since the signal is electromagnetic in nature, the precise sensing
mechanism and clock speeds have to be extremely fast, accurate and free of any
accumulated errors over time. Lastly, any over exposure of light from any external
source or due to reflection must be cancelled for detecting the original signal. A
popular way to cater for these issues is to use a continuous, modulated IR signal and
then measure the phase difference and amplitude of the reflected signal to measure
the time difference. Using modulated signals specifically helps in isolating the source
signal from ambient IR signals.
Light (or Laser) Detection and Ranging (LIDAR) are the most popular
implementation of time of flight sensors that have seen significant adoption for
bathymetry and underwater mapping purposes. Since light is significantly attenuated
26
in water, as previously explained in section 2.3, a monochromatic and spatially
coherent laser beam of blue or green wavelength (popularly 532nm green laser [31])
are used. Green wavelength has the lowest absorption coefficient, as detailed in figure
2.3. Therefore, green laser LIDARs have much better depth penetration in water than
lasers of other wavelengths. For underwater topography development, LIDARs
mounted on small aircrafts are used. Air borne LiDAR’s also offer a much wider
scanning area than ship borne acoustic sonars, and are commonly being used to
generate 3D maps of coral reefs and other sub-sea geological features. 3D scanning
using LiDARs is gaining popularity and several successful projects are being done,
with LiDARs mounted on AUV’s for autonomous inspection [32]–[37] and 3D scene
construction.
Lidars provide the means for reliable 3D point cloud generation aiding in a much
better underwater 3D scene reconstruction, in comparison to techniques like Structure
from Motion (SFM) from monocular or stereo cameras. Even though LiDARs
provide a much better method for sub-marine object and surface 3D reconstruction,
the equipment cost of commercial LiDARs is staggering and therefore out of reach
for small projects that do not have extensive funding’s or can afford investment into
the marine environment hardened hardware.
2.4.3 RGB-D Cameras
RGB-D (acronym for Red Green Blue and Depth) sensors combine RGB colour
information with depth information. RGB-D sensors are a multi camera system, with
RGB camera and a depth camera combined in one package. The depth camera can use
any of the previously explained techniques for depth sensing like structured light or
time of flight. An RGB-D sensor provides multiple time stamped outputs
simultaneously, including a standard RGB image and an intensity image, where the
value of each pixel represents the depth of the corresponding point in the scene. This
depth image can be used to generate a point cloud, for 3D scene reconstruction
processes. Originally designed as a gaming accessory, the RGB-D cameras have
found popular applications in various field, particularly in robotics. The Kinect sensor
27
by Microsoft was introduced to the market in November 2010 as an input device for
the Xbox 360 gaming console and was a very successful product with more than 10
million devices sold by March 2011. Kinect 360 had a 640x480 RGB camera and a
320x240 depth image camera. The depth sensing is using the structured light method.
The per-pixel depth sensing technology that is used in consumer RGB-D cameras was
developed and patented by PrimeSense® [38]–[40]. ASUS also developed RGB-D
sensors called the ASUS Xtion and ASUS Xtion Live, using the same technology
licenced from PrimeSense. In June 2011, Microsoft released a software development
kit (SDK) for the Kinect, allowing it to be used as a tool for non-commercial products
and spurring further interest in the product.
In September 2014, Microsoft released a new version named Kinect for Xbox One
(also Known as Kinect v2). The sensor employed ToF technology instead of the
structured light sensor and updated the camera to a full 1920x1080 px RGB, while
increasing the depth image resolution to 512×424 px. Technical details of the
KinectToF were released in an article [41], describing the working procedure and
features incorporated in the sensor. Google also announced the Google Tango, a
structured light depth camera in 2014. Following the popularity of these devices, in
2015, Intel released the first RGB-D camera under the Realsense brand. At the time
of writing of this work, Intel has released four versions, available as developer’s kits,
the F200 (2015), R200 (2015), SR300 (2016) and ZR300 (2016). These are also
structured light cameras, working in the infrared wavelength. A comparison of the
popular RGB-D sensors is given in table 87.1.
After the release of the first Kinect, the robotics and computer vision communities
quickly realized that the depth sensing technology in these sensors could be used for
other purposes than gaming. Specifically, since the sensor provided 3D scene
information at a much lower cost than traditionally used 3D depth cameras. But
because most RGB-D sensors are a commercial product, detailed technical
specification and data from the OEMs is often not provided and much information
has to be interpreted after extensive testing. For KinectToF, even though most technical
details have become known through extensive testing, officially there have not been
many publicly released information regarding technical specifications of the internals
28
and working of the sensor. Also, since the Kinect uses a proprietary SoC, detailed
working and algorithms of the firmware are not available, and have to be interpreted
by researchers. Hertzberg et al. [42] provide calibration of the sensor along with issues
about multiple sensors with overlapping field-of-views. Careful error analysis of the
depth measurements from the Kinect sensor has been made by Khoshelham [43].
Table 2.1: Comparison of popular RGB-D sensors
RealSense SR300
RealSense R200
RealSense F200
KinectSL KinectToF
Released Mar 2016 Sep 2015 Jan 2015 Jun 2011 Jul 2014 Price $150 $99 $99 ~$100 ~$120
Tracking Method
Structured Light
Structured Light
Structured Light
Structured Light
Time of Flight
Range (m) 0.2 – 1.2 0.5 – 3.5 0.2 – 1.2 0.4 – 8 0.5 – 8 RGB Image
1920×1080 30 FPS
1920×1080 30 FPS
1920×1080 30 FPS
640×480 30 FPS
1920×1080 30 FPS
Depth Image
640×480 60 FPS
640×480 60 FPS
640×480 60 FPS
320×240 30 FPS
512×424 30 FPS
Connection USB 3.0 USB 3.0 USB 3.0 USB 2.0 USB 3.0
Works outdoors
(Limited) Skeleton tracking
(2) (6)
Facial tracking
3D scanning
Gesture Detection
Toolkits
Java, JavaScript, Processing, Unity3D,
Cinder
Java, JavaScript, Processing, Unity3D,
Cinder
Java, JavaScript, Processing, Unity3D,
Cinder
WPF, Open-Frameworks, JavaScript, Processing, Unity3D,
Cinder
WPF, Open-Frameworks, JavaScript, Processing, Unity3D,
Cinder
29
2.4.3.1 RGB-D Cameras in Underwater Environment
It is interesting to note that even though, release of the first Kinect was in 2011,
insufficient work has been done in using the Kinect sensor, or any other commercial
RGB-D sensor, in underwater environment. It should be noted that generally, these
RGB-D sensors work in the NIR region which has a very high absorption rate in water,
as shown in The absorption and attenuation coefficients on NIR in water have been
calculated by [44], [45] and found to be as follows:
NIR Absorption coefficient in water [a(λ)] = 2.02 m-1
NIR Scattering coefficient in water [b(λ)] = 0.00029 m-1
NIR Attenuation coefficient (k) in water = 2.02 m-1
Tsui et el. [46] showed experimental results for depth image acquisition and point
cloud generation using the KinectSL and Softkinect DS311 time of flight camera. Both
cameras operate between the 800-830nm NIR region and they successfully generate
depth images underwater with significant noise; up to ranges of ~0.9m. NIR around
the ~850 nm wavelength has a very high absorption rate in water explaining the severe
loss of working depth. Dancu et el. [47] demonstrated depth images acquired from
KinectSL for surface below the water level. However the sensor was kept
approximately 0.5m above the surface of water, as the Kinect has a minimum depth
sensing limit of ~50cm. They were able to generate 3D mesh from depth images up
to a distance of 30 cm below water. Butkiewicz [48] has discussed KinectToF infrared
camera distortion model and its issues as well as initial results for 3D scanning in an
outdoor environment and from above the water surface, showing that the KinectToF
can acquire data up to a distance of 1 m.
Digumati et al. [14] are the first ones to successfully demonstrate 3D
reconstruction from low-cost depth sensors in controlled and real underwater
environment. Using the Intel Realsense structured light depth camera, the authors
demonstrated a system capable of capturing depth data of underwater surfaces up to
20cm. They also present a method to calibrate depth cameras for underwater depth
imaging based on the refraction of light and different medium between the image
plane and the object of interest. They presented two models for calibration of both
30
structured light and time of flight cameras. However results for only the structured
light model are discussed, which is processed at a speed of 1 frame per second.
Recently, Huimin et al. [49] proposed an improvement in depth map accuracy and
effects due to occlusion using underwater channels prior de-hazing model and
inpainting. A summary of the research done on RGB-D cameras underwater is given
in table 2.2.
Table 2.2: Summary of previous work on RGB-D sensors under water
Author Year RGB-D Sensor
Testing method
Camera motion
Results
Tsui et al. [46]
2014 KinectSL Container in
water Stationary
Data captured to rang of ~0.9m
Dancu et al. [47]
2014 KinectSL Above water
surface Stationary
Depth image at 30 cm below water
Butkiewicz [48]
2014 KinectToF Above water
surface Stationary Data up to a depth of 1m
Digumati et al. [14]
2016 Intel
Realsense Fully
submerged
6-DOF (Hand held scanning)
3D data of underwater surfaces up to 20cm
distance and 1hz
Huimin et al. [49]
2017 KinectToF Above water
surface Stationary
Solution of occlusion for underwater depth images
caused by relative displacement
of the projector and camera
2.5 Detailed Overview of Kinect RGB-D Sensors
Since there are several commercially available RGB-D sensors available as shown in
table 87.1, selection of sensor for this research was made keeping in view several
factors, such as depth sensing technique, depth resolution, ease of availability,
toolchain supported and ease of use. The most user-friendly and the widely accepted
of the RGB-D sensors is the Kinect. Not only is it supported with compatible drivers
with multiple frameworks and operating systems and is easily available, but it also
has the maximum adoption rate among roboticists. Consequently, much detailed
31
information and help is available for the Kinect, as compared to the other sensors. A
detailed breakdown of Kinect sensor working, comparison between KinectSL and
KinectToF and a brief explanation of the real-time 3D mesh generation algorithm
developed by Microsoft is given in the following sections.
2.5.1 Kinect for Xbox 360 (KinectSL)
The first version of Kinect is a structured light RGB-D camera. It works in the near
infrared domain, however the exact wavelength that the Kinect sensor works on is not
available from Microsoft, however it is at approx. 850 nm range [50]. The sensor
works by projecting a 3x3 times repeated pattern of IR dots. The projected pattern of
dots is then captured in an image with a traditional CMOS camera that is fitted with
an IR-band pass filter. The IR speckle pattern superimposed on the scene is compared
to a reference pattern stored within the Kinect. A view of the internal structure of the
KinectSL is given in figure 2.8.
Objects and various surfaces in the scene that are farther or closer than the
reference plane make the speckles shift and by using a correlation procedure, the
Kinect sensor can detect the disparities between the emitted pattern and observed
positions [51] and thereby the depth displacement at each pixel position in the image.
Each depth pixel is represented by one 16-bit unsigned integer value in which the 13
high-order bits contain the depth value (in millimeters). Any depth value outside the
reliable range or at places where the depth was unable to be calculated (mostly sharp
edges, IR absorbing and reflective surfaces) is replaced with a zero.
As KinectSL is based on the structured light technology in the NIR region, several
issues are inherent to the sensor. Issues such as dependence of ambient lighting, NIR
reflectivity on glossy surfaces, NIR absorption on certain surfaces, etc. result in holes,
rough edges and other irregularities in the output depth images.
32
Figure 2.8: KinectSL and its internal structure [52]
Furthermore, the RGB camera is only 320x240 pixels and the USB 2.0 standard
thereby has a low data bandwidth, which causes throughput issues for real-time
processing.
2.5.2 Kinect for Xbox One (KinectToF)
The second version of Kinect released in 2014, is a time of flight depth sensor and
calculates time of flight by finding the phase difference of a transmitted signal. Some
official technical details about the working and internals of the sensor were published
in [41], such as the phase difference calculation, audio and 3D processing capabilities,
face recognition and other details of the device. A view of the internal structure of
the KinectToF is given in figure 2.9.
For KinectToF, each pixel in the infrared image sensor has two photo diodes, which
are turned on and off alternately. The IR light source is pulsed in phase with the first
photo diode of the pixel and the reflected NIR signal is detected by the second photo
diode of the pixel that is turned on [53]. The light that is reflected returns with a delay
and phase shift in the signal. The signal intensity measured in each frame is the
difference between the output voltages of the two diodes [41], as shown in figure 2.10.
33
Figure 2.9: KinectToF and its internal structure [54]
This is used to detect the phase difference of the amplitude modulated light signal,
generated by the NIR laser emitters in the sensor, as briefly mentioned in section 2.4.2
and [55]. The raw depth data returned by the device is in millimeters and ranges
between 0 and 8000 (16-bit unsigned short). The minimum distance KinectToF can
measure is 500mm and the maximum reliable depth distance returned is 4500 mm,
however it can be disabled to get the full 8000mm depth data. Depth data is 0 for the
pixels where no depth can be calculated.
Figure 2.10: The 3D image sensor system of KinectToF [55]
34
Furthermore, KinectToF has a high dynamic range, to cater for different reflecting
sufraces and properties, and uses multiple transmitted frequencies of approximately
120MHz, 80MHz and 16Mhz to eliminate any depth sensing error and aliasing of
different modulated frequencies. Kinect specifications cite a maximum exposure time
of 14ms, and a maximum latency of 20ms to transfer each exposure data over USB
3.0 to the host system. The depth measured and the phase shift in the signal are related
by equation (1.4) by [55], relating depth (d), speed of light (c) and frequency of
modulation (fmod):
mod
2 .2
phase cd
f (1.4)
2.5.3 Comparison of Kinect Devices
Microsoft as an OEM has not explicitly released detailed technical specifications of
the internal hardware components and structure [50], and most of the information,
particularly for KinectSL, has been inferred by teardowns and manual inspection by
enthusiasts. Significant research has been done to evaluate the technical, metrological
difference of the two sensors, as well as a comparison of both versions of Kinect and
has been summarized in table 2.3.
Effect of different variables, such as effects of varying illumination, the effect of
the sensor and ambient temperature, various calibration parameters, including IR
sensor and RGB camera calibration, etc. have been discussed in detail by the authors
in [50], [53], [55]–[60]. Both versions of Microsoft Kinect utilize infrared emitters in
the Near Infrared (NIR) region with a wavelength of 850nm [50]. It is also interesting
to note that the KinectToF is also capable of working in outdoor condition [56], unlike
the first version.
35
Figure 2.11: Kinect measured depth vs actual distance
It should be noted that the depth calculated in both versions of Kinect is the
perpendicular distance from the object to the camera-laser plane rather than the actual
distance from the object to the sensor [61] as shown in figure 2.11. A detailed
comparison table of the two Kinect sensors is given in table 2.3.
KinectToF improves several things over the first version, technically and
otherwise. Firstly, the sensing method has been enhanced to be much more reliable
and accurate than the structured light method in KinectSL. Even though the depth
sensor has slightly smaller resolution than the first version, a much wider field of view
of the depth sensor doubles the average pixel per degree resolution, thereby giving an
improved scanning resolution. A wider field of view also eliminates the need of an
actuated base. Secondly, the upgrade to USB 3.0 standard means a much higher data
throughput, enabling real-time higher resolution scanning of the scene. The details of
internal hardware and System on Chip (SOC) used on board the device have generally
been speculated in numerous teardowns by hobbyists and researchers alike.
Researchers have identified detailed hardware specifications such as intrinsic camera
parameters, calibration methods, metrological characteristics and performance
parameters of the depth sensor. A quick summary of these works is given in table 2.4.
36
Table 2.3: Specification comparison of KinectSL and KinectToF
KinectSL KinectToF
IR / Depth Camera
Imaging Sensor Aptina MT9M001
CMOS -
Pixel Size 5.2 μm 10 μm
Resolution (px) 640 x 480 512 x 424
Field of View 57.5 ˚ x 45 ˚ 70˚ x 60˚
Average px/degree 10 x 10 5 x 5
Angular Distortion - 0.14 / px
Maximum reliable depth (Maximum depth)
4.5 m (8 m) 4.5 m (8 m)
Minimum depth 0.4 m 0.1 m
RGB Camera
Imaging Senor Aptina MT9M112
CMOS -
Pixel Size 2.8 μm 3.1 μm
Resolution (px) 320 x 240 1920 x 1080
Field of View 62 x 48.6 84.1˚ × 53.8˚
Average px/degree 22 x 20 7 x 7
Frame Rate 30 fps 30 fps
Sensor Type SL ToF
IR Spectrum (nm) ~ 850 ~ 850
USB Standard 2 3
Tilt degree ± 27˚
Number of person tracking 2 6
Kinect SDK version v1.8 v2.0
Can work in Outdoor environments
37
Tab
le 2
.4: P
revi
ous
wor
k do
ne o
n ch
arac
teri
zing
Kin
ectT
oF p
rope
rtie
s
Ref
eren
ce
Yea
r
Sen
sors
Use
d A
ccom
plis
hmen
ts
Rem
arks
[42]
20
14
PM
D C
ambo
ard
Nan
o T
oF
Cam
era
Mod
elli
ng o
f a
Tim
e of
Fli
ght c
amer
a an
d m
etho
ds to
cal
ibra
te a
nd c
ompe
nsat
e m
odel
led
effe
cts
incl
udin
g op
tica
l eff
ects
, sc
atte
ring
etc
.
Pro
pose
a s
enso
r m
odel
of
Tim
e-of
-Fli
ght
cam
eras
as
wel
l as
met
hods
to c
alib
rate
and
co
mpe
nsat
e al
l mod
eled
eff
ects
.
[15]
20
15
Kin
ectT
oF
Det
aile
d su
rvey
of
Kin
ectT
oF c
hara
cter
isti
cs
and
cali
brat
ion
appr
oach
es.
Hig
hlig
ht e
rror
s ar
isin
g fr
om th
e en
viro
nmen
t and
th
e pr
oper
ties
of
the
capt
ured
sce
ne a
nd p
re-
heat
ing
tim
e. P
erfo
rm g
eom
etri
c an
d de
pth
cali
brat
ion.
[60]
20
15
Kin
ectS
L (
mod
el
1414
and
147
3)
Exp
erim
enta
l stu
dy to
inve
stig
ate
perf
orm
ance
and
cha
ract
eris
tics
of
thre
e di
ffer
ent m
odel
s of
Kin
ect s
uch
as a
ccur
acy,
ef
fect
of
tem
pera
ture
on
3D m
odel
s.
Dev
iati
on o
f de
pth
on o
pera
ting
tem
pera
ture
is
high
ligh
ted.
[56]
20
15
Kin
ectT
oF
Asu
s X
tion
Pro
Com
pari
son
of a
ccur
acy,
det
ecti
on r
ate
and
pose
est
imat
ion
betw
een
firs
t gen
erat
ion
(Str
uctu
red
Lig
ht)
and
seco
nd g
ener
atio
n (T
ime
of F
ligh
t) d
epth
sen
sors
.
Dem
onst
rate
that
Kin
ectT
oF h
as h
ighe
r pr
ecis
ion
and
less
noi
se u
nder
con
trol
led
cond
itio
ns a
nd
can
wor
k ou
tdoo
rs.
[62]
20
16
Kin
ectT
oF
Inve
stig
atio
ns o
n pi
xel e
rror
s in
dep
th
imag
e, e
rror
bet
wee
n re
al &
mea
sure
d di
stan
ces,
err
ors
due
to in
cide
nt a
ngle
, ta
rget
type
, col
our
& m
ater
ials
. R
econ
stru
ctio
n of
gen
eric
sha
pes
& e
rror
s
Infl
uenc
e of
the
Kin
ectT
oF s
enso
r te
mpe
ratu
re o
n th
e re
turn
ed d
epth
val
ues
are
valu
able
res
ults
.
38
Ref
eren
ce
Yea
r
Sen
sors
Use
d A
ccom
plis
hmen
ts
Rem
arks
[63]
20
15
Kin
ectT
oF
Infl
uenc
e of
fra
me
aver
agin
g, p
re-h
eati
ng
tim
e, m
ater
ials
and
col
ours
, out
door
ef
fici
ency
. Wor
k on
geo
met
ric
and
dept
h ca
libr
atio
n.
Sho
w th
at d
epth
mea
sure
men
ts a
chie
ved
are
mor
e ac
cura
te c
ompa
red
to th
e fi
rst K
inec
t dev
ice.
[53]
20
15
Kin
ectS
L
Kin
ectT
oF
Met
eoro
logi
cal c
ompa
riso
n be
twee
n K
inec
tSL a
nd K
inec
tToF
and
thei
r ac
cura
cy
and
prec
isio
n.
Sho
w a
dec
reas
e of
pre
cisi
on w
ith
rang
e ac
cord
ing
to a
sec
ond
orde
r po
lyno
mia
l equ
atio
n fo
r K
inec
t I, w
hile
Kin
ect I
I sh
ows
a m
uch
mor
e st
able
dat
a.
[64]
20
15
Kin
ectS
L
Kin
ectT
oF
SR
4000
S
truc
ture
Sen
sor
Com
pari
son
of a
vera
ge e
rror
s be
twee
n th
e se
nsor
s, e
ffec
t of
colo
ur o
n de
pth
sens
ing.
Ave
rage
Euc
lide
an e
rror
is s
mal
ler
for
the
ToF
se
nsor
s is
alm
ost c
onst
ant a
long
dif
fere
nt
dist
ance
s to
the
targ
et.
39
2.5.4 3D Scene Reconstruction Using Kinect Fusion
Kinect Fusion, developed by Newcomb et al [65], is a real-time complex and
unstructured scene mapping system using low-cost depth sensors and graphics
hardware. It was released with the Kinect SDK, as a real-time 3D scanning tool for
developers. At its core, Kinect Fusion is a Simultaneous Localization and Mapping
(SLAM) system, that can do two things in parallel; localization (to estimate the
current and up to date pose of the sensor with respect to the current scene) and
mapping (expand and improve the current map of the scene using global optimization
techniques), in real time. Using the depth data from the RGB-D sensor, Kinect Fusion
fuses each depth map into one single 3D volume or mesh, to form a dense global
surface, when the scene is scanned from the Kinect sensor from multiple viewpoints.
Therefore, it provides 3D object scanning and model creation using a Kinect sensor.
This 3D surface is nonparametric and provides surface and sensor orientation
representation in the field of view which can be useful for physical interaction,
especially in the field of robotics.
2.5.4.1 Brief Working of Kinect Fusion
As the Kinect is moved around in the scene, the pose of the sensor is tracked as it is
moved. Since each frame's pose and its relation to the previous scene is known,
multiple viewpoints of the environment can be integrated together. The overall
working of the algorithm is given as shown in figure 2.12 and described in the
following paragraphs.
The input and first step for the algorithm is the 3D depth data from the Kinect
sensor. This data is in the form of a point cloud and consists of 3D points/vertices (in
camera coordinate system) and the orientation of the surface (surface normals) at
these points. The raw depth information is stored in a voxel, a data point on an evenly
spaced 3D grid. Note that it only represents a single point and not volume information.
40
Figure 2.12: Kinect Fusion overall workflow as given by Newcombe et al [65]
The spacing of the voxels (voxels per meter) is defined by the depth resolution of
the scene being scanned. Next, the location of the sensor in global coordinates is
tracked as the sensor is moved across the view, so the current position of the sensor
with respect to the initial point is known. This is done by iteratively aligning the
acquired and the last measured surface using the Iterative Closes Point (ICP)
algorithm. ICP minimizes the difference between two clouds of points and each frame
is transformed to best match the reference [65]. The algorithm iteratively revises the
transformation matrices (translation and rotation) needed to minimize the distance
from the source to the reference point cloud. An illustration of this concept is givn in
figure 2.13.
Figure 2.13: ICP for aligning point clouds acquired by Kinect
Initial point cloud Aligned point cloud
Kinect scanning path
41
The third stage is fusing of the depth data from the known sensor pose into a single
volumetric representation of the space around the camera. This integration of the
depth data is looped and performed on every nth frame acquired (frames can be
dropped to increase performance in static or slow translating environments). A
bilateral filtering process is applied to the acquired data to smoothen out the point
cloud before integration and remove erroneous data points. As the sensor moves
across the scene from slightly different viewpoints, any gaps or holes where depth
data was not present in the previously acquired frames or generated mesh is also filled
in. Previously created surfaces are increased in resolution and refined as more data
comes in and is repeated over frames.
In the last stage, the fused frames in a mesh need to be rendered on screen. The
rendering technique used is called Volumetric ray casting. It is an image-based
volume rendering technique that computes 2D images from 3D volumetric data for
display on the screen. It must not be confused with ray tracing, which renders surface
data only. For volumetric rendering, the tracing ray passes through the object,
sampling the material along the ray path. There are no ray reflections or secondary
rays sprouting from the main rays. The resultant mesh can be shaded according to the
variation of depths along the surface or direction of the surface normals. Typical
volume sizes that can be scanned are up to around 8m3 while typical real-world voxel
resolutions can be up to around 1-2mm per voxel. However, it is not possible to have
both simultaneously, because the algorithm is memory dependent. Kinect Fusion
stores the data in GPU memory in a voxel grid. Therefore, the limiting factory
becomes the available memory.
2.5.4.2 Tracking Performance and Reconstruction Volume
The most important point for Kinect fusion is that for tracking purposes, it only
uses the depth data which is captured using the Kinect's infrared sensor. Since RGB
is not used, therefore there is no effect of lighting conditions on the scene. This also
enables Kinect fusion to work in pitch dark conditions. However, for the tracking to
work, the scene must have enough texture or variations that are visible in infrared. If
42
there is a lack of texture, like a plane wall or surface, the tracking fails to extract any
features to align the frames to and therefore loses the track. For best results, a cluttered
unstructured scene gives the best results. In addition to the requirement of a textured
scene, the scanning rate also plays a part in maintaining a stable tracking output.
Movement in small and slow increments help the acquisition of closely aligned point
clouds and help in lower alignment cost per frame. If tracking is lost, the camera can
be repositioned to a previously aligned frame to recover any lost tracking in real time.
Optionally, the option to align the RGB image to the captured depth image and
generated 3D mesh. After alignment, the RGB colours of each pixel can be overlaid
on the mesh, to generate a coloured 3D mesh. This may be processed to create a
coloured rendering or for application in additional vision algorithms such as 3D object
segmentation.
To store the depth information, the number of voxels that can be generated
depends on the amount of memory available on the host computer. The resolution in
the three axes can be different and can be set to encompass the area required to scan.
The size in real world units that one voxel represents is defined by ‘voxel per meter’
(vpm). So, a (512)3 voxel volume can represent a 4m3 volume of the real world if the
vpm is set to 128 (512/128=4). In this case, a single voxel will cover a volume of
4m/512=7.8mm3. The same (512)3 can be set to represent a 2m3 volume of real world,
at a much higher resolution of 256vpm (512/256=2m3). Then each voxel would cover
a volume of 2m/512 = 3.9mm3). The voxel per meter can be set independently for the
three axes. However, to scan an extensive area with a very high resolution is no
possible, since the number of voxels required to scan a large area would increase
beyond control. There methods to use multiple devices and multiple GPU’s each with
their own contiguous memory space, however that is beyond the scope of this research
work. A visual representation of a cubic voxel volume is given in figure 2.14.
43
Figure 2.14: A cubic volume is subdivided into a set of voxels which are equal in
size and defined per axis. [66].
2.5.5 3D Reconstruction Algorithms for RGB-D sensors
After the release of Kinect Fusion, several researchers have developed similar and
improved open and closed source implementations. Several other techniques are also
being developed by researchers for SLAM and 3D scene reconstruction. A summary
of some popular alternatives is given in table 2.5.
44
Tab
le 2
.5: S
umm
ary
of r
elat
ed w
ork
done
on
scen
e re
cons
truc
tion
and
map
ping
Ref
eren
ce Y
ear
Tec
hniq
ue(s
) A
ccom
plis
hmen
ts
Lim
itat
ions
[65]
20
11
Pho
ng-s
hade
d B
ilat
eral
fil
teri
ng
ren
deri
ngs
GP
GP
U
TS
DF
(Kin
ect F
usio
n le
adin
g pa
per)
. Usi
ng d
epth
onl
y, it
is c
ompl
etel
y ro
bust
to in
door
ligh
ting
sce
nari
os. U
ses
all f
ram
es o
f K
inec
t dat
a.
Usi
ng e
very
6'h
fra
me,
and
64
tim
es le
ss G
PU
mem
ory
is u
sed
by
redu
cing
the
reco
nstr
ucti
on r
esol
utio
n to
643
.
Vol
ume
gene
rate
d is
lim
ited
to
mem
ory
avai
labl
e fo
r nu
mbe
r of
vo
xels
[67]
20
12
TS
DF
C
SP
O
dom
etry
(Kin
tinu
ous)
Im
prov
es o
n K
inec
t Fus
ion’
s vo
xel l
imit
atio
n. N
ot
rest
rict
ed to
a s
mal
l vol
ume
by th
e av
aila
ble
mem
ory
on th
e G
PU
. In
stea
d, v
olum
e m
oves
thro
ugh
spac
e al
ong
wit
h th
e ob
serv
er a
nd
tria
ngul
ar m
esh
is c
reat
ed f
or s
lice
s th
at le
ave
the
volu
me.
Com
bine
s IC
P-b
ased
trac
king
wit
h de
nse
colo
ur in
form
atio
n to
ach
ieve
rob
ust
trac
king
in a
larg
e va
riet
y of
env
iron
men
ts.
Loo
p cl
osur
e de
tect
ion
and
mes
h re
-in
tegr
atio
n
[68]
20
11
Poi
nt C
loud
M
anua
l cal
cula
tion
of
poin
t clo
ud b
ut tr
ansf
orm
ing
curr
ent p
osit
ion
in r
eal w
orld
usi
ng tr
ansf
orm
atio
n m
atri
x ca
lcul
ated
fro
m S
UR
F o
n ca
mer
a po
se
Slo
w, p
oint
clo
ud u
pdat
es o
nly
afte
r se
vera
l fra
mes
. Cre
ates
a p
oint
cl
oud
but n
ot a
mes
h
[69]
20
13
Bil
ater
al f
ilte
ring
K
inec
t Fus
ion
TS
DF
B
ilat
eral
fil
teri
ng o
n no
isy
Kin
ect d
ata
to r
educ
e an
d fi
ll s
mal
l hol
es.
[70]
20
12 V
olum
etri
c V
oxel
R
epre
sent
atio
n (U
sing
Kin
ectS
L)
Use
s O
ctom
ap li
brar
y to
gen
erat
e a
volu
met
ric
repr
esen
tati
on o
f th
e en
viro
nmen
t.
Ope
n-S
ourc
e
Doe
s no
t ass
ume
odom
etry
dat
a av
aila
bili
ty.
45
Ref
eren
ce Y
ear
Tec
hniq
ue(s
) A
ccom
plis
hmen
ts
Lim
itat
ions
[71]
20
14
PO
VR
ay r
ende
rer
Kin
tinu
ous
Use
s D
epth
and
RG
B n
oise
mod
elli
ng.
Ope
n-S
ourc
e
[72]
20
13
Occ
upan
cy G
rid
Map
s pr
opos
es u
nifi
ed f
ram
ewor
k of
bui
ldin
g oc
cupa
ncy
map
s an
d re
cons
truc
ting
sur
face
s hi
gh-s
tora
ge c
ompl
exit
y of
oc
cupa
ncy
map
s in
mem
ory
[73]
20
13
Vox
el H
ashi
ng
Alg
orit
hm
The
fus
ion
of s
pati
al a
nd v
isua
l dat
a us
ing
a fr
amew
ork.
bui
ldin
g el
emen
ts a
re r
ecog
nize
d ba
sed
on th
eir
visu
al p
atte
rns.
The
re
cogn
itio
n re
sult
s ca
n be
use
d to
labe
l the
3D
poi
nts,
whi
ch c
ould
fa
cili
tate
the
mod
elli
ng o
f bu
ildi
ng e
lem
ents
.
Illu
min
atio
n ef
fect
s on
the
segm
enta
tion
and
clu
ster
ing
resu
lts
coul
d no
t be
com
plet
ely
rem
oved
.
46
2.6 Proposed Methodology
As it has been discussed in the sections above, the objectives of this proposed research
are to investigate the performance of KinectToF sensor in underwater environment.
Once the working of the sensor has been found to be satisfactory, a solution for real-
time 3D scene reconstruction in underwater environment will be developed. The
algorithm is proposed to be based on the popular Kinect Fusion algorithm, but as it is
not designed to work in a medium other than air, it must be adapted to cater for the
new working medium. Furthermore, all the unwanted effects that adversely affect the
performance of sensors underwater must be catered for. This includes noise removal
on the acquired point cloud, a correction for the refraction effect of water and the
housing material itself etcetera. To execute this research work, the methodology will
therefore have two distinct parts, as shown in figure 2.15.
The first part will be the design of a custom-built casing so that operation of the
device isn’t hampered in any way in underwater environment. The casing material in
the field of view of the Kinect must be selected so that it has minimum effect and
absorption on near infrared wavelength. Secondly, since the pressure sustained by
submerged objects increases linearly with depth, the casing has to be able to sustain
such pressures for at least 3-4m depth so the data acquisition process can be done
without any issue. The design of this tailored casing has been deliberated at length in
chapter 3, as part of the research methodology.
Secondly, since insufficient research has been done in the area of working of
commercial depth sensors under water, there remains a large research gap and a
significant lack of data on the effective performance and issues faced in such devices.
Taking inspiration from working of RGB cameras in water, it is speculated that issues
like refraction, noise due to turbidity, absorption and scattering of NIR wavelength,
etc. will be faced with varying effect and therefore corrections for these problems will
have to be devised as part of the research work.
47
Figure 2.15: Workflow for generating real-time 3D meshes from Kinect sensor, in
under water environment. Coloured blocks are the contributions of this research.
To utilize Kinect Fusion for 3D scene reconstruction in underwater environment,
additional filters need to be appended to the framework to cater for the effect of
multiple refraction due to the three mediums. Results of the data acquired and
corrected will be analysed qualitatively and quantitively by comparison to meshes
created by Kinect fusion and KinectToF in different environments and conditions, such
as clear and turbid waters etc. The results, discussed in detail in chapter 5, show the
effectiveness of the proposed methodology and creates further avenues of research
that can be explored in future work.
48
2.7 Summary
Depth sensing is a necessary task not only for just understanding the sub-sea surfaces
and features but also for developing autonomous robotic solutions and improve state
of the art in robotics. A brief overview of the depth sensing techniques has been
covered in this chapter, with emphasis on non-contact optical depth sensing. Effect of
water on the propagation of light in water, effects of temperature and salinity, etc. and
the resulting effects on underwater imaging have been discussed in detail. Different
imaging techniques of optical imaging and optical sensors have been discussed
including structured light and time of flight cameras and commercial RGB-D
cameras. Focusing on the RGB-D sensors, the Microsoft Kinect sensors (both
KinectSL and KinectToF have been compared in detail. Lastly, working of Kinect fusion
has been explained and other inspired algorithms are compared to understand their
pros and cons. Lastly, the proposed methodology of this research has been outlined
as it will be explained in the following chapters.
49
WATERPROOF CASING DESIGN AND DATA PROCESSING PIPELINE FOR
UNDERWATER 3D DATA
3.1 Overview
The methodology can be divided in two distinct parts. Firstly, a custom waterproof
housing for Kinect had to be designed and prototyped for data acquisition underwater,
with minimum effect on the imaging data. Second, the RGB and Depth cameras must
be calibrated, since the working of optical cameras changes when used in denser
medium than air. In addition to the calibration issues, the change in working medium
adds several unwanted issues such as enhanced distortion, change in time of flight
calculations and refraction of the rays of light. These hardware and software issues
and their developed solutions are discussed in detail in the following sections.
3.2 Design and Prototyping of Waterproof Housing
The KinectToF does not have any visible or commonly known Ingress Protection (IP)
rating information available. From several teardowns that are available online, it can
be assumed that the device is rated at IP5x or IP6x, with much more emphasis on dust
protection and none to limited splash protection. Since significant processing is done
onboard the Kinect’s specialized DSP processor, it also requires some heat dissipation
via air circulation, which is achieved by a 5V 40mm DC Fan. As the electronic device
is clearly not meant to handle water ingress of any sort, a specialized waterproof
casing has been designed to take the sensor underwater, as proposed in section 2.6.
The waterproof and transparent housing has been designed so that not only can it
protect the electronic hardware, but also enable clear and obstruction free field of
50
view to the on-board IR and RGB sensors. Specific details of the design decisions,
material selection, water ingress protection for the casing as well as the measured
effects of the casing on the Kinect's field of view (FOV) are discussed in the following
sections. The design of the casing has been done keeping in view the operational
requirements of Kinect in water, namely:
Kinect is easy to handle around the scene or objects of interest to fully capture
the depth data.
There is an obstruction free and clear view in front of the IR and RGB cameras.
Maintaining air-flow for cooling the onboard electronics should be ensured in
a sealed environment, while the device is being operated underwater for a
longer operational time.
Since the Kinect needs external power and USB connection to the host
computer for real-time processing, the point where the cable comes out of the
housing must be properly sealed against increasing water pressure with depth
of 3D scanning.
Lastly, the casing should allow easy access to the device along with easy
removal and reset to the desired place again, if required.
3.2.1 Transparent Material Selection
The most important part of this casing is the front face, which must offer obstruction
free, clear and distortion free view to the NIR and RGB cameras. Also, since Kinect
is an NIR device, the transparent material should have a high NIR transmission
percentage. It was therefore essential for the proper material to be selected. Other than
the transmission, the following factors were considered in selecting the housing
material, in order of importance:
NIR Transmission percentage
Refraction Index
Density
Thickness available
Malleability and machinability
51
Adhesion to other materials
Availability
The first clear choice is glass, however there are several issues in selecting it as a
housing material. Firstly, the brittleness for normal thin light glass casing would be
impractical for a robust casing design. To increase the strength, the thickness of the
glass would have to be increased, or specialized glass casings of hardened glass such
as Pyrex would be required, which in turn would increase weight. Also, glass would
make it difficult to get a custom shaped and custom designed piece manufactured.
The second most relevant material option is Perspex or Acrylic, a thermoplastic
material that is easily available and is malleable. Perspex (also known as Acrylic)
provides obstruction free view to the cameras. It has a refraction index of 1.49 and is
light weight, with a density of 1.18 g/cm3. A 2mm thick Perspex sheet would also
provide a strong and robust material for trouble-free use underwater.
Figure 3.1: Transmission percentage of different wavelengths of light through 3mm
Acrylic [74]. The red band is NIR wavelength used by KinectToF
100
80
60
40
20
0
300 500 700 900 1100 1300 1500 1700 1900 2100
Wavelength (nm)
Tra
nsm
issi
on (
%)
Acrylic NIR Transmission (3mm sample)
52
According to data available from manufacturers, the transmittance (Γ) of NIR
wavelength is approximately ~95% for the entire NIR range [74], and the decrease
starts to occur in the infrared region, as shown in figure 3.1. Transmittance is the ratio
of intensities of incoming light in air to that of light passing through a medium and is
given by equation (3.1) [75].
acrylic
air
I
I (3.1)
The absorption coefficient is related to the transmittance according to the
relationship given in equation (3.2):
kze (3.2)
Therefore, a Perspex housing will not absorb the light emitted by the Kinect’s
NIR source. This was also experimentally verified by testing different thicknesses of
Perspex sheets (2mm, 3mm, 5mm and 8mm) as compared in figure 3.2 and using out-
of-the-box Kinect Fusion sample application provided by Microsoft in the SDK.
Comparing the accuracy of the reconstructed mesh, it is noted that there is nearly
negligible distortion in the depth map reconstructed for all thickness of Perspex.
However, the Perspex sheet must be preferably touching the Kinect's’ face or be very
close to its face. If there is a gap of more than 4-5mm between Kinect and Perspex,
there is a noticeable loss of depth data from the edges of the image. Furthermore, if
the sheets are further from the front face, internal reflection of the IR rays occurs
directly in the area in front of the source. This leads to loss of depth data in the centre
of the image and the reconstructed mesh also.
53
Figure 3.2: 3D mesh reconstruction results in open air through various thicknesses
of Perspex (a) Original scene (b) No Perspex (c) 2mm (d) 3mm (e) 5mm (f) 8mm
3.2.2 Casing Structural and Sealing Design
The casing for Kinect has been designed in a modular fashion, as shown in figure 3.4.
It consists of two side housing holders that hold the Perspex face in place and double
as the main structural part of the casing. A slot is designed in the housing holder and
the arms to ensure that the Perspex sheet remains fixed. The slots are also filled with
clear automotive silicone paste, acting as a seal of the casing to prevent water entering
the case. The device itself is held firmly in place with 2 exact sized Kinect holders,
designed to fit inside the Perspex housing which limit any vertical and lateral
movement of Kinect while scanning objects under water.
KinectToF has air intakes on the sides of the device and a 40mm fan in the rear for
hot air exhaust, to maintain air flow for cooling the on-board processor and other
electronics. The designed Kinect holders allow almost unrestricted airflow intake
from the sides of the Kinect for cooling purposes. The back plate is also cut out from
8mm Perspex sheet and is held together firmly with 16 × M5 bolts distributed around
the back side, with a layer of clear automotive silicone paste. There is approximately
50mm gap between the back plate and Kinect's rear mounted exhaust fan, to allow for
54
air circulation inside the casing during operation. Due to unavailability of the required
size of cable glands, a custom cable seal was designed as a 4-part snap-fit assembly
and has 5x rubber washers installed to ensure water proofing of the cable entering the
casing. Designing a function waterproofing seal was the most difficult part of this
entire assembly, and required several design revisions after initial testing. Section and
exploded views of the cable seal is given in figure 3.3 while the complete assembly
and exploded view are given in figure 3.4. Detailed drawings of the casing sub
assembly are given in Appendix A.
Kinect is powered with a proprietary power adapter from Microsoft, that is
bundled with the Kinect for Windows Adapter, required to access the data streams of
the KinectToF sensor. To cater for the power requirements, the adapter was replaced
with a standard 12V 7Ah sealed lead-acid battery.
Figure 3.3: Cable gland design (a) Exploded view (b) Cross-section view
55
Figure 3.4: Designed housing assembly (a) housing only (b) housing with KinectToF
(c) exploded view of the assembly
56
3.2.3 3D Printing Considerations
The structural protective housing has been designed and prototyped for Kinect using
Fused Deposition Modelling (FDM) 3D printing technique. The material selected was
ABS (Acrylonitrile Butadiene Styrene), as opposed to Poly Lactic Acid (PLA)
material, as the latter can absorb water and humidity in due time. The printing was
done using a 3DSystems® CubePro Duo which is a Fused Deposition Modelling
(FDM) additive manufacturing technique based 3D printer. The 3D print was done
with a layer resolution of 0.7μm and 80% solid infill, 5 top and 6 bottom layers to
reduce the chance of porosity and a diamond hatched support print pattern to increase
overall structural strength. Because of the way FDM 3D printed objects are
manufactured (layer by layer stacking of linear fused plastic extrusions), there is a
high chance of porosity between the printed layers even after fusion of the material
layers, as shown in figure 3.5 below. To cater for any such non-visible inaccuracies,
the printed parts were coated with multiple layers of aerosol solvent based paint and
covered with a coat of clear lacquer. These methods helped fill in any invisible
porosity between the printed layers and add a protective layer against water intrusion.
Figure 3.5: Zoomed in view of the porosity between fused 3D printed layers due to
FDM process
57
3.2.4 Structural Analysis of Designed Housing
As static pressure exerted on the walls of a submerged object is dependent on the
density ρ, acceleration due to gravity and depth of the object. As the depth d increases,
the pressure distribution P on the surface of the object becomes asymmetric along the
height of the object and is more at the lower half as compared to the top half, as shown
in figure 3.6(a). As this is a linear trend, therefore the pressure that will be exerted on
the Kinect housing can be shown from the figure 3.6(b).
Figure 3.6: (a) Increasing pressure exerted on a submerged object in water (b)
simulated linear relationship of pressure in water and depth
To assess the pressure that the housing can sustain, 3D structural strength
simulations using Finite Element Analysis (FEA) were carried out. The respective
materials for the objects such as Perspex and silicon nitride for washer were selected
according to the exact specification of the materials available. As the prototype was
meant for data collection at relatively shallow depths of 3 to 5 meters, the same
pressure of 3 to 5 meters was simulated, which translates to a pressure of 0.3 atm. The
stress analysis and simulation results show that a thickness of 4 mm for the shell for
the casing holders, as well as the cable glands would suffice to sustain the pressures
exerted at the casing. As it can be seen by the results in figure 3.7, more prominent
deflections occur especially at the centre of the holder. It must be noted that the Von
(a) (b)
dept
h (
)d P=f(d)
Pressure distribution on an immersed cube
dept
h (
)d
58
Mises stress, as well as the deflection results shown here, have been simulated by
analysing a hollow casing design with a wall thickness varying from 1mm to 4mm.
Figure 3.7: Structural strength analysis of the designed KinectToF housing (a) Von
Mises stress distribution (b) displacement due to pressure (c) 1st principle stress (d)
3rd principle stress (e) Safety factor results
Von Mises stress is a theoretical measure of stress, used to predict yielding of
materials under given load condition from results of simple axial tensile tests. This is
done as the support structure in 3D printing are generated at the time of generating
the g-codes from the model files. Also, there is no direct method to duplicate the exact
diamond support structure inside a hollow structure, unless it is designed to be a part
of the model. Furthermore, if the simulations of hollowed out parts can sustain the
pressures designed, the support structures inside the hollow print will further
strengthen the part, adding a much larger safety factor than the ones simulated,
allowing to sustain the stresses at depths of 10 to 15ft below water. While it is
59
understood that these simulations do not consider the manufacturing properties of a
layer based 3D printed model, however, the simulations enhanced our confidence in
the stability and strength of the housing design.
Figure 3.8: Structural strength analysis of the designed cable gland (a) Inside edge
Von Mises Stress distribution (b) Outside surface Von Mises Stress distribution (c)
Inside edge displacement due to pressure (d) Outside surface Displacement due to
pressure
A similar stress analysis was also carried out on the cable gland as shown in figure
3.8 and washer assembly to see if the glands can sustain the pressures. As seen from
the results in table 3.1, the cable glands sustain the pressure easily without any
buckling of the casing and all the stress is transferred to the rubber washers, further
strengthening the seal. Details of the simulation results obtained are given in table
3.1.
60
Table 3.1: Stress analysis simulation results summary
Name Kinect Holders Cable Gland
Minimum Maximum Minimum Maximum Volume (mm3) 1091530 - 327952 -
Mass (kg) 1.233 - 0.388629 - Von Mises Stress (MPa) 2.77E-08 3.51E+00 1.98E-20 9.22E+00 1st Principal Stress (MPa) -1.97E+00 2.68E+00 -4.46E+00 6.55E+00 3rd Principal Stress (MPa) -4.70E+00 8.36E-01 -8.95E+00 7.23E-01
Displacement (mm) 0.00E+00 1.16E-01 0.00E+00 6.00E-03 Safety Factor 5.69727 15 4.62473 15
3.3 Refraction Correction and distortion removal in Underwater 3D Data
The following sections address camera calibration and filters developed for removal
of noise and accommodating the various effects of water as an imaging medium such
as distortion removal and refraction correction.
3.3.1 Kinect RGB and Depth Camera Underwater Calibration
Before the camera calibration can be explained, some basic concepts that will be used
for calculating the final results have to be explained. Then the methodology for
calibrating the Kinect’s RGB and depth cameras will be discussed in detail. Note that
since the depth detected by Kinect is calculated from images taken from the infrared
camera, the calibration of the IR camera will automatically mean that the depth
images incorporate the calibration corrections.
3.3.1.1 Camera Calibration Concepts
Imaging cameras are ideally modelled after a pin-hole camera model, where the light
from the scene passes through a virtual pinhole and forms an inverted image on a
plane behind the pinhole. In real world cameras, however, to capture maximum light
from the scene required to form a clear image, the light is passed through the camera
aperture and focused on a virtual pin-hole by a lens. The image is formed on a plane,
61
that lies on an imaging sensor as shown in figure 3.9. Since the model deviates from
an ideal camera model, the lenses and imaging sensors have inherent properties, that
effect the image captured. To cater for the unwanted effects of these inherent
properties, the cameras need to be calibrated before utilization. By calibration, we can
estimate the internal parameter of the lens installed and the imaging sensor properties
that are required to form an image. The complete camera parameters can be described
by intrinsic parameters (covering the lens position and orientation, imaging sensor
orientation and image capturing properties, distortion coefficients of the lens) and
extrinsic parameters (3D coordinates of the camera relative to the scene). These
intrinsic parameters and distortion coefficients of the camera are unique to each
device. The intrinsic parameters are used to calculate and add a correction to the image
acquired. This is a necessity for detecting and calculating real world sizes of objects
in the scene. Camera calibration is also essential for robotics, navigation systems and
3-D scene reconstruction as well.
Figure 3.9: Camera calibration process
To project a point in 3D space on to a 2D image plane, perspective projection and
the pinhole camera model are used, which can be defined by equation (3.3):
Connect to Kinect DeviceWorld Coordinates( )X, Y, Z
Camera Coordinates( )x, y, z
Connect to Kinect DevicePixel Coordinates( )m, n
Extrinsic Parameters Intrinsic Parameters
Object Camera Image Sensor
y
x z
y
x z
v
u
62
|
1
Xx
Yy K R T
Zz
(3.3)
where:
, Coordinates of the projection plane in pixels
= scale factor (ideally 1)
Intrinsic matrix
Rotation matrix
Translation matrix
, , = Coordinates of object in real world
x y
z
K
R
T
X Y Z
z is the scale factor and is used if the lens scales the image. K is the camera’s
intrinsic parameters and R and T form a transformation matrix that relates the 3D
point’s transformation projection onto a 2D plane. The intrinsic matrix of the camera
is defined by eqn. (3.4):
K= 0
0 0 1
x x
y y
f q
f q
(3.4)
where:
, Focal length in pixels
, Principal point
Skew coefficient between x and y axis
x y
x y
f f
q q
The focal lengths (fx, fy) are focal lengths distances between the pinhole and the
image plane (on the image sensor) in pixels. They can be converted to distances in
millimetres using a scale factor inherent to the lens. The skew (γ) is non-zero only
when the image sensor axes are not perpendicular.
63
The camera extrinsic parameters are defined as the 3-dimensional location of the
camera in the world coordinate system (or conversely, the transformation coordinates
of a point to a coordinate system that is fixed with respect to the camera). The
augmented transformation matrix (3×4) is given by eqn. (3.5):
11 12 13 1
21 22 23 2
31 32 33 3
Transformation matrix = |
r r r t
R T r r r t
r r r t
(3.5)
where:
Rotation matrix (3x3)
Translation matrix (3x1)
R
T
The pixel coordinates (m, n) of a point on the 2D image plane can then be
calculated by eqn. (3.6):
*
*x x
y y
m f x q
n f y q
(3.6)
The camera’s intrinsic matrix given in equation (3.4) does not incorporate the
radial distortion effects due to the lens and therefore these have to be dealt separately.
The distortions can be categorized into two types, radial distortion and tangential
distortion, as shown in figure 3.1. Radial distortions are due to the imperfections in
the manufacturing of lenses. It also occurs in special lenses like wide angle, fish-eye
and telephoto lenses.
64
Figure 3.10: Types of distortion in an image [76]
Tangential distortions occur due to any miss-alignment between the lens and the
image sensor, which should be completely parallel to each other. Radial distortion can
be negative (pincushion distortion) or positive (barrel distortion) and they relate the
distance of the pixel from the centre of the source image to the equivalent distance in
the acquired image. In the case of distortion, the camera equation given in equation
(3.6) is extended by replacing x and y with x’ and y’ as given in equations (3.7) and
(3.8):
2 21 2' 2 ( 2 )x x p xy p r x (3.7)
2 22 2' ( 2 ) 2y y p r y p xy (3.8)
2 3 4.( . . . )x x r r r (3.9)
2 3 4.( . . . )y y r r r (3.10)
2 2 2r x y (3.11)
65
where
2 2
Linear scaling of the image
, , Radial distortion coefficients of the lens
, Tangential distortion coefficients
', ' Undistorted(corrected) pixel locations
p p
x y
The distortion coefficients x’ and y’ are represented by a higher order polynomial
equation in equations (3.9) and (3.10). Ideally, the scaling value δ is 1. An undistorted
image has δ =1 and α=β=ζ=0. Positive values of these parameters shift the distance
towards the centre, counteracting pincushion distortion while negative values of the
parameters shift the points away from the centre, counteracting barrel distortion.
Changes in ‘α’ affect only the outermost pixels of the image. Changing ‘β’ has a more
uniform effect on the image distortion [77]. To avoid any scaling while removing
distortion, the parameters should be chosen so that α+β+ζ+δ=1. Furthermore, the x
and y are normalized values, between 0 and 1.
3.3.1.2 Underwater Calibration of Kinect Cameras
For most professional cameras, the distortion coefficients are provided with the
camera. However, normally, these parameters have to be calculated for the particular
camera in use. This is easily accomplished by acquiring images of a checkerboard,
with a known size of the checkbox pattern. Multiple images of a checkerboard can
then be used to calculate the intrinsic and extrinsic parameters. Since Kinect cameras
were designed to be used in open air, the Kinect drivers come with pre-defined camera
calibration parameters that closely approximate any KinectToF device manufactured.
However, for use under water, the camera intrinsic parameters and distortion
coefficients will have to re-calculated. The calculated parameters can then be used for
undistorting the images before using them for scene reconstruction.
In order to calculate the intrinsic parameters and distortion coefficients, several
ready-to-use methods are available, including a dedicated toolbox function for camera
calibration in MATLAB. For calibrating Kinect however, an open source tool called
66
GML C++ Camera Calibration Toolbox [78] was utilized. Reference checkerboard
5×6 matrix, with each square size of 30×30mm, was developed and coated to be taken
underwater. In addition to the black and white checkerboard, a coloured checkerboard
was also printed, to test some colour restoration algorithms on images acquired by the
RGB camera. This colour corrected images can then be used for generating coloured
3D meshes if the images acquired are taken under sufficient light. Results of the
camera calibration procedure, along with the coefficients calculated are discussed in
detail in section 4.2.2.
Figure 3.11: Pictures of (a) black and white calibration checkerboard and (b) colour
checkerboard taken underwater from the Kinect RGB camera.
3.3.2 Time of Flight Correction in Underwater Environment
KinectToF calculates depth by emitting a continuous wave of IR from the embedded
NIR source and measuring the phase difference between transmitted and received
signals. The hardware calculates the time it takes for the returning beam to calculate
the distance. However; KinectToF does not consider the difference of speed of light in
different mediums. Although trivial in nature, this results in the creation of 3D meshes
at an incorrect distance from the camera position. Since KinectToF measures the time
for the return of the NIR rays, the measured depth by KinectToF is reported to be farther
than the actual distance to the object, as it takes longer for the signal to return due to
light propagating at a slower speed.
67
Additionally, since KinectToF is encased in a housing, the infrared light must pass
three different mediums, as shown in Figure 3.12, First is air between the housing
inner face and the imaging sensor, second is the housing material and last is water.
Originating from KinectToF, the NIR beam will therefore face two media transitions.
The actual distance measured by KinectToF at each pixel (dpixel) will be a sum of the
three different distances, namely: the distance between the image sensor and the
housing’s inner face (dair), the thickness of the housing material (dperspex) and the
distance in water to the actual object or surface (dwater). Similarly, the time taken by
the NIR beam by KinectToF is also split into three different parts, as given in Eqn.
(3.12) and (3.13) respectively:
= ( )water pixel air perspexd d d d (3.12)
( )water pixel air perspext t t t (3.13)
The original time used by KinectToF for calculating the depth (twater) can be found
and updated using Eqn. (3.14) and (3.15):
pixel perspexairpixel air perspex
air air perspex
d ddt t t
c c c (3.14)
( )water pixel air perspext t t t (3.15)
The new time in water can then be used to calculate the new distance (dcorrected)
using the speed of light in water as given in Eqn. (3.16). This corrected distance can
then be used to update the new depth at each pixel in water by Eqn. (3.17).
corrected water waterd t c (3.16)
68
Figure 3.12: Calculating the corrected time of flight values
pixel corrected air perspexd d d d (3.17)
As a result, the updated depth value calculated is closer than the one calculated
by KinectToF. The same calculation is performed for each 512×424 depth image, to
get the actual depth at each pixel. This results in a depth value that is closer than the
one calculated by Kinect as shown in Table 3.2
Figure 3.13: Simulated depth distance (mm): measured (red) vs actual (blue)
Water
Perspex
Air( = 1.00)η
( = 1.49)η
( = 1.33)η
Kinect Plane
Incident NIRray
(not to scale)
dshift
dair
dcorrected
tperspex
tair
θ
θperspex
θwater
twater
dperspex
dwater
Act
ua
l dis
tan
ce
69
Table 3.2: Measured vs actual distance measured by Kinect under water
Measured Distance (mm)
500 550 600 650 700 750 800 850 900
Actual Distance (mm)
375 412.5 450 487.5 525 562.5 600 637.5 675
3.3.3 Distortion Removal for ToF Camera in Underwater Medium
Images acquired by cameras used underwater show a distinct distortion. This
distortion is much more evident for depth images and the resulting point clouds of a
flat wall or surface. The visible effect of this is an alteration in the reconstructed mesh
of a flat surface into a peculiar concave shape that is bulging towards the camera along
the corners and pinched backwards from the centre. The effect is significantly
enhanced when the depth data is acquired in underwater environment. This
amplification in distortion is due to the refraction of light in water and the effect of
the transparent housing, both of which will have to be incorporated in the mesh
generation code. The correction developed for removing this unwanted distortion can
be defined as being a two-stage process. The first being the distortion in the point
cloud due to the refraction of infrared light in water, whereas the second stage is the
removal of pin-cushion optical distortion, which is also visible in the RGB images.
Both stages are discussed in detail below.
3.3.3.1 Refraction Correction of Depth Data in Underwater
Point clouds acquired by KinectToF underwater show significantly amplified
pincushion radial distortion that is significantly amplified than the distortion
measured in air by Butkiewicz [48]. This distortion is much more visible for depth
images of a flat surface where the reconstructed mesh shows a concave shape, bulging
towards the camera along the corners and pinched backward from the center. The
concave nature of the acquired point cloud can be visualized as shown in Figure 3.14,
which illustrates the detection of a depth point in water. The cause of this unwanted
amplified distortion is due to the fact that the pinhole camera model has been
70
developed for taking images in air, where the incident light does not have to interact
with different mediums (other than the lens of the camera) for image formation
process. But in underwater environment, where the camera is enclosed in a casing,
the speed of light is different in different medium. This interaction first must be
modelled and the resulting effect to be incorporated in underwater 3D mesh creation.
Figure 3.14: Formation of virtual image at due to refraction
As illustrated in Figure 3.14, the point detected is in-line with the incident NIR
ray along the normal to the actual point. As we move away from the center, the error
increases with the increase in the angle of incidence of the NIR ray. The rays of light
along the centre of the camera face nearly zero refraction as compared to the rays in
the outer layers and edges. Note that it is assumed that NIR emitters and the depth
camera sensor have no horizontal offset between them and the principal axis of the
depth camera is aligned with the optical axis of the image sensor, for the sake of
simplification of the problem.
Kinect Plane
Actual depth
Water
Perspex
Air( = 1.00)η
( = 1.49)η
( = 1.33)η
(not to scale)
dshift
Measured depth
71
Using Snell’s law, it can be shown that the shift in the depth (dshift) is dependent
on incident and refraction angles and the actual depth measured. Using the
nomenclature defined in Figure 3.12, we define the incident angle as air and
refracted angle in Perspex as perspex for a single ray originating from KinectToF and
passing through the space of air between the sensor surface and the Perspex housing.
The refracted angle from Perspex to water is given by water . Note that the only
numerical measures available are the uncorrected depth value and the thickness of
Perspex. Using the Law of Sines the shift in the depth shiftd can be derived as given
by equations (3.18), (3.19) and (3.20).
Figure 3.15: Calculating refraction of a ray of light for two materials resulting in a
shift in perceived depth
1sin(90 )
mdb
(3.18)
1 22
.sin( )sinshift
bd
(3.19)
1
2
Incident IR ray
RefractedIR ray
dshift
b
θ
θ
2θ
dair tair
dcorrectedduncorrected
72
1 2
1 2
sin( )
sin(90 ) sinmeasured
shift
dd
(3.20)
1
2
where:
shift in distance measured
Incident angle
Refracted angle
shiftd
This shift must be incorporated for every pixel in the (512×424) depth image by
calculating the incident angles, refraction angles and radial depths of each pixel with
respect to the principal axis of the depth camera. The incident and refracted angles
are first calculated for the transition from air to Perspex. The refracted ray then
becomes the incident ray and the angles are recalculated for Perspex to water. To
calculate the incident and refracted angles, we assume that a single ray of light is
generated for every pixel of the image sensor and returns to the same sensor pixel
after being reflected by the scene in the FoV. This method is analogous to the ray-
casting technique used in the field of computer graphics. We first translate all the
pixel locations from the 2-dimensional Cartesian coordinate system (x, y) to a 3-
dimensional spherical coordinate system (r, θ, φ). This is done by two simple
rectangular to polar trigonometric transformation, and can be defined by the following
core set of spherical trigonometric equations:
2 2 2x y zr r r r (3.21)
1 1
2 2 2cos cos
z z
x y z
r r
rr r r
(3.22)
1tany
x
r
r (3.23)
73
Figure 3.16: Spherical to image coordinate conversion
The result is three 512×424 matrices, one for the radial distance r, one for the
inclination angle (in radians) and one for the azimuth angle φ (also in radians). Using
the θ and angles, we can calculate orthogonal projections of the radial distance r
along the z-axis.
Mathematically, let the spherical coordinates of the ray segment inside the
housing be denoted by z airr , its projections by ,zx air zy airr r and angles by ,air air ,
respectively. After striking Perspex, the NIR ray changes path after refraction and
,air air become the incident angles. This new path is denoted by
, ,z perspex zx perspex zy perspexr r r and ,perspex perspex for magnitude, projections, and angles
inside Perspex, respectively. Lastly, the NIR ray is refracted again as it moves from
Perspex to water and the path is traced to depth values in the water. The spherical
coordinates are denoted by , ,z water zx water zy waterr r r and ,water water for magnitude,
projections, and angles in water, respectively. The magnitude zy waterr will be the new
depth value in water along the z-axis, incorporating the effect of refraction of all the
mediums as it has been traced along the entire path. The error in distances for each
pixel can be calculated by subtracting the original depth value dpixel from zy waterr . The
entire process can be visualized in figure 3.18.
m
n
2D Image
y
zxθ
φ
Spherical coordinates
(r ), , θ φ
ryrx
rz
74
Figure 3.17: Projections of a depth point on the Kinect sensor image plane
Figure 3.18: Methodology to trace the light ray path for each depth pixel
In the above figure, the yellow plane is the x-axis projection of the light ray; the
blue plane is the y-axis projection plane of the light ray. The result of this ray tracing
method is visible in the point clouds and the 3D mesh created, where the concave
nature of the plane is significantly fixed and a concave plane is adjusted to form a
corrected, flatter plane, before getting to the actual 3D mesh results. Incorporating
(0,0)
(212,-256) (212,256)
(-212,-256) (-212,256)
θ
φ
xaxis
z daxis ( epth)
rz
rzx
rzy
yaxis
75
refraction angles and calculating X and Y projections for all the pixels in the 512x424
depth image, we can find the actual position of the point. Results have been simulated
on depth plane that is curved around a centre axis, simulating a type of pincushion
distortion, as given in figure 3.19.
It can be seen that by tracing the ray paths originating from a NIR sensor and
captured by the IR imaging camera, both of which are assumed to be concentric, a
correction can be calculated incorporating the effect of different refraction angles for
different medium. A pseudo code of the above-devised method is given as algorithm
1 to explain the process further.
(a) (b)
Figure 3.19: (a) Bottom plane (blue) is the simulated curved point cloud whereas the
top (yellow) is the refraction corrected point cloud (b) A plot of the calculated error
distance that grows larger error as the distance from central axis increases.
76
Algorithm 1: Refraction correction 1: // get incident ray matrix (θand φ) 2: for i= +Vertical FOV to -Vertical FOV in 424 steps 3: for j= +Horizontal FOV to -Horizontal FOV in 512 steps 4: φair = i, θair = j; 5: end 6: end 7: for each image pixel u ∈ depth image Di 8: // calculate new refracted angles for air → Perspex ray 9: [θperspex, φperspex] = calculateRefractedAngles(θAir, φAir, ηair, ηperspex); 10: // calculate ray projection magnitudes for Perspex 11: [rzx_perspex, rzy_perspex] = calculateRayLengths(θperspex, φperspex); 12: // calculate new refracted angles for Perspex → water ray 13: [θwater, φwater]=calculateRefractedAngles(θperspex,φperspex,ηperspex,ηair); 14: // calculate ray projection magnitudes for water 15: [rzx_water, rzy_water] = calculateRayLengths(θwater, φwater); 16: end 17: corrected_depth_value = rzy_water; 18: dShift = rzy_water - depth_img;
3.3.3.2 Pincushion Distortion Removal in Depth Images
In addition to the refraction correction devised in above section, the concave nature
of the generated mesh was not fully reduced. This was deduced to be because of the
camera calibration issues of KinectToF. Due to unavailability of the camera
specifications and technical details from Microsoft (being their proprietary product),
significant research has been done by researchers in not only calculating the working
principles of the device but also properly calibrating the RGB, depth and IR cameras
of both versions of Kinect. The Kinect SDK v2.0.x provides some hard-coded
calibration values, but camera calibration varies from device to device, it must be re-
done for a more accurate alignment. Since the Kinect camera has been calibrated for
use in open air, the camera calibration values must be recalculated to cater for the
distortion produced by the housing and refraction in water. By recalibrating the
camera and implementing the calibration model in removing the pin-cushion
distortion, it was estimated that the last of the remaining radial distortion effects
77
would be significantly reduced, thereby generating a much flatter 3D mesh in addition
to using the above designed ray-tracing methodology.
3.3.4 3D Point Cloud Noise Filtering in Turbid Medium
All types of data acquisition in underwater faces noise in the medium. As water is
rich environment for suspended microbial, invisible and semi invisible life forms, any
sensor, whether acoustic (sonars) or optical (RGB or depth cameras) working in water
has to cater for unwanted reflections of the sensor rays from these suspended bodies.
Furthermore, just as air has a significant amount of dust and smog particles, water
also has a lot of suspended particles alongside living organisms. Both these are the
cause of significant in all sorts of data and thus must be actively cancelled out before
the data can be used further. This noise is in addition to any noise generated inherently
by the sensor itself and therefore mandates the addition of a layer of complex and
often computationally expensive noise removal algorithms and techniques as part of
the pre-processing stage.
The depth data from Kinect has a significant amount of random noise. This noisy
depth data is due to the way Kinect works. To capture the depth of a scene and its
contents, the scene is illuminated with infrared light, and the reflected rays are
captured by the imaging sensor. These reflections for a single pixel are often from
multiple objects of the same IR ray. The noisy data and multiple reflections are
cleaned out the onboard embedded processors before being forwarded to the image
acquisition drivers on the PC side. Nevertheless, there is no fixed depth data of a
single point over multiple frames. This results in noisy depth values being reported
over several frames of a single pixel. This in turn leads to noisy depth data of an
entirely flat surface, on each frame. A sample point cloud view is given in figure 3.20.
As can be seen, the point cloud acquired which looks quite clean and clear to the
Kinect (figure 3.20 (d)) is quite noisy when seen from the left view
(figure 3.20 (c)). The point cloud is acquired with a large amount of random points
that are detected by suspended particles and other microbial life that reflects the
78
infrared rays in water. The red line in the left view is the minimum limit the Kinect
can measure depth data, limited by design at 500mm.
Figure 3.20: Front and left views of acquired noisy point cloud
As briefly explained in section 2.5.4, Kinect fusion includes a bilinear filter for
cleaning the depth data before fusing it in a mesh. However, since in general, air in a
normal environment does not have any significant number of suspended particles that
reflect infrared rays, therefore a simple bilinear filter is sufficient for normal use.
However underwater environment poses a significantly more challenging
environment for underwater imaging in any wavelength. The number of suspended
particles in the natural underwater environment increases exponentially in
comparison to open air. Furthermore, due to refraction, the amount of reflection from
rays particularly at outer periphery of the sensor increases quite a lot.
To cater for this additional noise in under water medium, additional filtering has
to be done in the pre-processing stage. However, in order to achieve real-time
79
processing, the filtering has to be fast enough so that the performance is not slowed
down too much. Several popular filters exist for filtering noisy image data in 2-D
images. Since the noise can be categorized as of salt and pepper type, therefore a
median filter was the best suited for the particular case. A median filter works by
arranging the depth values in a fixed window in ascending order and replaces the
centre of the kernel with the median of the values which is the centre of the sorted
range. The index of the median value is given by ( 1 2)C N where C is the centre
pixel of N number of pixels in a window. This is better than a simple averaging kernel
as using a median avoids any bias in the depth values, and helps remove any abnormal
depth values in the field of view. A caveat is that since the filter is based on sorting
the values, which is a computationally expensive process, an implementation of the
filter has to be selected so that if that offers the fastest sorting. Similarly, the size of
the window controls the speed of the filter, with a bigger window leading to a much
slower filtering. Several implementations of median filters have been devised keeping
the performance in view, such as Zhang et al. [79] and Huang et al. [80]. A good
comparison of various median filter implementation has been discussed by Perreault
et al. [81]. Keeping in view the ease of implementation and performance speed, the
Fast Histogram based median filter was implemented, along with the normal sorting
based median filter for purely comparison purposes. The filter can be enabled or
disabled during the implementation, however the mesh is regenerated once the state
is toggled.
3.3.5 Customized Kinect Fusion Implementation
A software implementation of Kinect fusion, additional noise filterers, time of flight
and refraction correction algorithms implementation were an integral part of this
research work. The implementation of Kinect Fusion algorithm was adapted from the
original algorithm provided by Microsoft along with the SDK 2.0. Several tweaks and
alterations have been implemented to develop a customized software, with a
customized front end Graphical User Interface (GUI) developed in XAML, while the
backend was coded in C# language, using the Windows Presentation Foundation
80
(WPF) render for visualizing the 3D mesh. A brief explanation of the various
functions of the GUI is given in the figure 3.21.
Figure 3.21: The main user interface and sub-windows
A detailed explanation of the working of the software and its additional windows,
along with short user tutorial is explained in Appendix B. The software developed for
this research work using Kinect SDK 2.0 captures data from Kinect attached to the
USB port, or from the data being played back from Kinect Studio for processing in
real time. Data captured from the RGB and IR cameras is visible in real time, along
with the depth image generated. Additionally, a coloured image of the camera
tracking algorithm alignment results, color-coded by per-pixel algorithm output is
also visible, to analyse the tracking and alignment results. This may be used as input
to additional vision algorithms such as object segmentation. Values of the colour
image vary depending on whether the pixel was a valid pixel used in tracking (inlier)
or failed in different tests (outlier). A detailed flowchart of the Kinect fusion
implementation, along with the main function names is given in figures 3.22 and 3.23.
A detailed explanation of the user interface, its sub-windows and its usage
methodology has been provided in the appendix.
81
Figure 3.22: Kinect Fusion implementation flowchart (Kinect Fusion SDK function
names are written in red) (Page 1)
82
Figure 3.23: Kinect Fusion implementation flowchart (Kinect Fusion SDK function
names are written in red) (Page 2)
83
3.3.6 Qualitative & Quantitative Performance Criteria for 3D Meshes
The 3D reconstructed meshes need to be compared qualitatively and quantitatively.
Various researchers have defined several qualitative parameters for analysis of 3D
meshes such as [82], however no direct evaluation method is available for use with
3D reconstructed mesh. The parameters to evaluate the qualitative output of the
reconstruction results are subjective based on the user, and had to be selected
intuitively, based on the perceived visual appearance and quality of the output meshes.
The devised parameters for qualitative evaluations are given in table 3.3:
Table 3.3: Visual parameters for qualitative analysis
Parameter Description
Smoothness The mesh should form a discontinuous and smooth
surface, representing the original object being scanned
Feature preservation Major features of the original object must be retained and
make it easily distinguishable from the background
Keeping in view the methodology defined in section 2.6, the qualitative analysis
can be devised as a four-stage process. The first stage is the visual comparison
between meshes generated with the original Kinect Fusion code working in open air
for scanning the selected objects. These original scanned meshes form the ground
truth of the objects and can then be compared to the meshes generated by scanning
them underwater without any application of noise filtering. In the third stage, the same
meshes are reconstructed using the devised noise filtering. Lastly, the mesh is
regenerated using the noise filtering and time of flight correction incorporated. The
following flow chart can show the qualitative analysis workflow in figure 3.24.
Figure 3.24: Qualitative analysis comparison process
84
In order to quantitatively analyse the results, we need to do a mesh to mesh
comparison and calculate the sizes of the reconstructed objects. This is very easy for
the 3D printed objects selected for scanning, as the 3D models and meshes are
available which can be compared easily with the reconstructed meshes. For the
objects, whose 3D models are not available, the scanned mesh created by the original
Kinect Fusion application can be taken as ground truth data, as the scanning quality
of Kinect in air has already been proven to be quite accurate to the original object
being scanned. Once the object has been scanned in air, it can be cropped out of the
3D mesh scene while retaining its original dimension data for further use.
Several techniques have been previously developed for comparison of two 3D
meshes or two point clouds, or a combination of a mesh and point cloud. Several
popular programs are freely available for use such as MeshLab [83], CloudCompare
[84], etc. which provide the facility to compare two different meshes or point clouds
and give a numeric comparison based on one of the standard algorithms. Typically, a
triangular mesh is only a point cloud (the mesh vertices) with an associated topology
(triplets of 'connected' points corresponding to each triangle). So, a mesh can be
treated as a set of points in a 3D space. One of the most common methods of
measuring the distance between two sets of space is Hausdorff Distance. The
Hausdorff distance is a metric between two-point sets which measure the distances
that can relate the proximity of one set of points to another set of points. Therefore, it
can be used to determine the degree of resemblance of two sets of point clouds.
Mathematically it is described as the maximum distance of a set to the nearest point
in the other set, and is given by the following equation:
( , ) max{min{ ( , )}}BA
h A B
(3.24)
Where ψ and ω are points in sets A and B respectively. The metric σ(ψ,ω) can be
any distance metric like the Euclidean distance between two points. The distances can
then be represented as a colour heat map, indicating the deviation of each point from
85
the plane on a colour scale. An example plane and its heat map, when compared to an
ideal flat plane, is given in figure 3.3.
Figure 3.25: (a) Target point cloud (green) and reference point cloud (yellow)
(b) finding the distances of the point clouds (c) error heat map
In the simulated figure 3.3, the blue colour represents the closest distances (+ive
and -ive from the reference plane), green represents the medium (+ive and -ive from
the reference plane) and red represent the farthest distances (+ive and -ive from the
reference plane) values. An ideal plane that is completely aligned will have a single
colour heat-map. The colour scales can be represented differently as well, but the
overall concept remains the same.
3.4 Experimental Setup
To acquire 3D Kinect data in underwater environment, multiple data acquisition
experiments were done. These experiments were done in controlled environments and
the results are assumed to be similar in deeper natural environments. Two locations
were selected for data collection, based on the type of conditions they offered. The
locations used were the Offshore Experiment Lab and the swimming pool available
within UTP premises. The offshore lab houses a Wave Flume (Edinburgh) and a Wave
Tank (HR Wallingford) that is 10×20×1.5 (W×L×H) meters and has a variable depth
86
of up to 1.5 m. The stagnant water contained in the tank has turbidity, with suspended
particles and small amounts of algae in the water. The swimming pool provides clear
water with no visibility issues in the RGB domain. The depth of the pool available is
1.5 m on the shallow side, which increases to 3 m on the deeper end of the pool. The
pool provides an ideal controlled environment that has clear water free from any
visible suspended particles.
Figure 3.26: Experimental setup for data acquisition at swimming pool and offshore
experiment facility at UTP
3.4.1 KinectToF Underwater Dataset and Selection of Test Objects
Although significant research has been done in underwater environment, with data
acquired from various sensors and methods, however, there is a significant lack of
publicly available datasets of underwater 3D imaging. This is partly due to the
challenges in the acquisition of data in underwater environment, as not only the
equipment required is costly and specialized, but there is significant effort required
87
for the design of data acquisition of experiments in underwater as special care must
be taken for usable data acquisition. Since no one has used the KinectToF for acquiring
underwater images, this research required the acquisition of data underwater to test
the hypothesis. The data acquired by Kinect have been saved for offline simulation
and testing as well. Microsoft provides several tools bundled with the Kinect SDK,
one of them being the Kinect Studio. This is a closed source tool developed by
Microsoft, that allows capturing complete Kinect data from all its cameras and
provides RGB, IR and depth data along with the audio input from the array of
microphones in Kinect. Furthermore, the Kinect Studio also captures and displays 3D
point clouds and body tracking data in real time. These data streams can also be saved
for further analysis and processing in a closed proprietary file format (*.xef). These
files can later be loaded in the Kinect studio and seamlessly connected to the Kinect
drivers, simulating the sensor for use with any custom application using the Kinect
drivers and SDK. For this research, the data acquired in underwater were captured
using Kinect Studio so the data can be processed post collection. The objects were
first scanned outside of water to acquire a ground truth mesh. The same objects were
scanned again under water. The raw data of both these phases are available at the
above-referred link. Details of the Kinect Studio version used, and other
miscellaneous details are as follows:
Kinect Studio Version: 2.0.1410.19000
Kinect Sensor Version: 4.0.3916.0
Windows Environment: Windows 10 (Version 1607, Build 14393)
One of the major inherent requirements of Kinect fusion, like any image
reconstruction algorithm, is that the objects or view of interest must have distinct
features that can be processed. As Kinect works in the NIR spectrum, it is imperative
that the features of interest must be visible in the infrared spectrum. This requirement
dictates the selection of objects for scanning, as distinct features that are visible in
RGB domain such as textures or patterns are not necessarily also visible in infrared.
Also since there is absorption of the light spectrum wavelengths in water, the
textures on any object surface are less distinguishable than in open air. For scanning
88
under water, it is preferable that the objects and should have easily distinguishable
and angular edges and faces, so the difference in depth is noticeable. Objects that have
a surface normal that can be easily calculated will result in a better and smoother
mesh, whether captured in open air or underwater. Very fine textures or intricate
surface designs are not easily distinguishable under water and do not make a good
reconstruction test target. Therefore, based on these requirements, the following
objects listed in table 3.4 were selected for scanning, and have been used throughout
the results section.
For qualitative and quantitative analysis, the objects detailed in table 3.4 were
selected as they provided a diverse variety of characteristics such as material, shape,
features for comparison. Several objects that were scanned proved to be helpful in
testing the limitations of the proposed system and algorithms which have been
discussed in detail in section 4.2.3.
89
Table 3.4: Objects selected for scanning and their characteristics
Objects Material Noticeable feature in
infrared Distinct texture or pattern in infrared
Basketball
Rubber
Spherical surface for reliable qualitative
analysis or reconstruction
Linear seams and surface texture visible in IR underwater
3D Printed house model
ABS plastic
(painted)
Distinct orthogonal walls and shape visible
in IR
Embossed features like windows visible in IR underwater
3D printed Rubik’s
Cube
ABS plastic
(painted)
Distinct orthogonal walls and shape visible
in IR
Embossed cube faces visible in IR
under water
Coffee Mug
Ceramic Distinct shape visible
in IR None
3D Printed Trophy stand
ABS plastic
(painted)
Separate walls and plaque holder with
visible depths in water None
Cement Brick
Cement Multiple orthogonal
faces None
Decorative table plant
Painted plastic
Randomly oriented small bundle of leaves
None
Rubber safety shoe
Rubber None None
90
3.4.2 Uncorrected RGB, IR and Depth Images from Submerged Kinect
Several experiments were done under water to collect data for further analysis. The
objects mentioned in section 3.4.1 were scanned from several perspective points and
styles, and data was extracted from the *.xef files in image form. Some sample RGB,
IR and depth images are given in figure 3.27. Analysis of the raw images collected
shows a significant amount of pincushion distortion in the RGB and IR cameras. The
distortion in the IR camera is transmitted to the depth image, and thus the generated
point cloud. The effect of turbid water in the RGB and IR images can be seen easily,
as well as the performance of the RGB camera in the absence of any external light
source.
As is clear from the above raw RGB images taken underwater from inside a
protective casing, there is a clear distortion in the RGB images that can be defined as
a type of pincushion distortion. As Kinect has been pre-calibrated only for use in air,
the cameras are necessary to be recalibrated, using the procedure defined in section
3.3.1. In addition to the distortion, since Kinect is a ToF sensor and calculates depth
by measuring time for the light to return, the actual depth calculated in underwater
environment is different than the ones originally reported by Kinect. The results of
these corrections are discussed in detail in the following sections. These additions
also become the major contributions of this thesis, wherein the standard Kinect Fusion
algorithm is adapted to work for Kinect working in a sealed enclosure under water.
Detailed discussion on the effects observed and the corrections devised are discussed
in subsequent paragraphs.
91
Figure 3.27: Raw images captured from KinectToF cameras under water (a) RGB (b)
depth (c) infrared
3.4.3 KinectToF RGB and IR Camera Calibration
The Kinect’s IR and RGB cameras were calibrated using a 7×8 (square size of 30
mm) checkerboard and the free calibration toolbox provided by the GML Camera
Calibration Toolbox [78]. While the RGB camera calibration was not that problematic
especially when done in clear water, the calibration of the IR camera under water was
not straight forward. The infrared images acquired were of very low intensity due to
92
the absorption of light. Therefore, the images had to be enhanced using simple
contrast adjustment techniques available in any decent image viewing software and
editor. Some calibration images used and their detected output, along with the
calibration values are given in figure 3.28. Adobe Photoshop® CC 2014 was used to
enhance the IR images using curve intensity adjustments of the histogram. Even after
enhancement, most of the images of the checkerboard were not usable due to the
additional noise enhanced during the adjustment process. Kinect Fusion Parameters
for Scene Reconstruction. Results of the camera calibration process are given in
section 4.2.2.
Figure 3.28: Infrared camera calibration underwater (a) original images (b)
enhanced IR images (c) calibration images used to calculate the parameters
93
3.4.4 Real-Time Data Collection, Scanning Rate and Parameters
Due to the computationally intensive 3D mesh construction in real time, Kinect fusion
requires that the data scanning rate be slow to avoid any blur or artifacts in the
acquired image. Since the data acquired under water have additional turbidity and
suspended particles causing unwanted blur in both the RGB and IR cameras, therefore
the scanning rate of the submerged objects had to be limited to a slow speed of a few
centimetres per second. Furthermore, as will be shown in the following sections, there
is an additional loss of depth data due to the absorption of NIR in water, for best mesh
recreation results, the scanning must be carried out so that most of the scene is
captured in the generated point cloud. Details and effects are discussed in detail in the
subsequent sections.
In addition to the scanning rate, the term ‘real-time’ in imaging applications can
be categorized as ‘firm real time’ and ‘soft real time’ as noted by [85]. The firm real-
time category requires the update rate to be at-least 30 frames per second whereas
‘soft’ real-time systems can be defined as image processing systems where the slow
processing rate of the intermediate frames does not affect the end product. Since
Kinect Fusion is significantly dependent on the hardware it is run on, including details
such as the amount of Physical RAM and VRAM available, along with the
specifications of the GPU or CPU being used, in addition to the quality of the point
cloud data being provided to the algorithm, therefore the mesh reconstructed falls in
the category of ‘soft real-time’ systems. Often, during the reconstruction process if
the quality of the point cloud does match a pre-set tracking error threshold, the mesh
reconstruction is paused until tracking is acquired again. Therefore, a reconstruction
rate of 30 fps is achievable, provided the processing hardware meet the algorithms
recommended requirements.
Kinect Fusion has several parameters that can be modified at runtime for
achieving a better working performance, such as voxel resolution, the number of
voxels per axis and the integration weight. The trade-off is the increase or decrease in
reconstruction performance versus quality of the mesh generated. As Kinect Fusion
is not designed to be a memory efficient algorithm [86] but focuses more on the speed
94
of reconstruction, the mesh being constructed cannot be very high resolution and
simultaneously cover a very large depth of scene. In order to efficiently use the
additional designed filters and refraction correction algorithm in addition to running
the memory expensive method, several parameters were tested to give the best balance
between speed and accuracy of the mesh. After much experimentation, the following
values were selected for the meshes given above and will be used throughout the
remainder of this document:
Voxel size (X-Axis): 256
Voxel size (Y-Axis): 256
Voxel size (Z-Axis): 256 or 384
Voxels per meter: 384
Integration weight: < 450 & >750
3.5 Summary
The main methodology of this thesis has been laid out in this chapter. As KinectToF is
not designed for under water use, a special casing was designed keeping in view the
various factors described in this chapter. The material selection of the transparent
sensor housing has been discussed in detail. The casing was developed using additive
manufacturing techniques, and went through rigorous simulations and stress analysis
to test if it could withhold the pressures of going under water. Results of the stress
analysis showed a large safety factor in the design. Issues regarding camera
calibration and their requirement for underwater images were also discussed in detail.
Time of flight correction to accommodate the change in measurement medium has
been described. The algorithm developed for distortion removal of the acquired depth
data and its mathematics and its implementation was explained in detail as well as
qualitative and quantitative comparisons to analyse the acquired results were
elaborated, which will be used to evaluate the results acquired in the subsequent
sections. Lastly, details about the experimental setup, the raw dataset acquired and
considerations while data acquisition were also explained.
95
RESULTS AND DISCUSSION
4.1 Overview
This chapter details the results of the complete algorithm pipeline starting from the
data collection from the Kinect inside the casing, outside and inside water. As already
defined in section 1.6, the primary objective of this research work was to investigate
the performance KinectToF sensor in underwater environment. Once the performance
has been gauged, an implementation of real-time solution for 3D reconstruction in
underwater is tested and results are generated and validated. To validate the results,
qualitative and quantitative methods are explored in detail, according to the
methodology defined in section 2.6. The resulting meshes are compared with ground
truth data collected from Kinect outside water along with comparisons with
parametric 3D meshes. Data was collected at multiple times and varying lighting and
water conditions and turbidity levels, details of which are given in the experimental
setup section below.
4.2 Performance of KinectToF Sensor in Underwater Environment
Since there is no prior information about the working of KinectToF under water, the
operation and performance of the sensor had to be evaluated before gathering any data
for reconstruction. The change in medium affects the working range and quality of
reconstruction of the sensor. These effects and the consequential performance
changes are discussed in detail in the following sections:
96
4.2.1 Kinect Depth Camera Performance in Underwater Environment
The most important and novel result for validating any further work was the proof of
concept that KinectToF can work in underwater environment. Since no prior work has
been done on this, therefore quantitative and qualitative results of Kinect’s working
were of crucial importance.
Figure 4.1: Reported vs actual depth of KinectToF in underwater environment
Since the Kinect ToF camera uses ~800-830nm NIR wavelength, the depth
measurement capability of the Kinect is strongly attenuated in underwater
environment due to the natural absorption property of water. Also, there is a hardware
limitation of the minimum depth that the KinectToF can measure, which is limited at
500mm. Experimental results showed that the Kinect successfully measures depth and
generates accurate dense point clouds of its FoV between a minimum of 500mm and
a maximum of ~850mm, as shown in Figure 2(b). However, the reliability of the any
depth data below 500 and after 850mm is dubious. This was confirmed by the results
acquired from the depth images, as shown in figure 4.2.
375
850
640
8000(mm)
4500
500
Kinect Plane
Maximum Kinect depth in air
Actual minimumdepth underwaterReported
minimumdepth underwater
(not to scale)
97
However, the images given in figure 3.3 were acquired at a different distance than
the ones reported by Kinect depth images. The actual working depth of Kinect is
around ~350 mm to ~650 mm instead of the reported depth of 500 mm to ~850 mm
working depth. This difference in the depth calculation is due to the change in the
working medium of the sensor. As previously discussed, the time of flight calculation
by the Kinect hardware assumes the sensor is working in open air. Due to the change
in the working medium, the values reported are not the actual values but are greater
than the actual depth of measurement. For a proper reconstruction, the time of flight
correction must be applied to the acquired depth images to reconstruct the scene at
the correct depth from the sensor, as detailed in section 3.3.2.
Figure 4.2: Original depth data reported by KinectToF
98
4.2.2 Camera Calibration and Distortion Correction Results
From the calibration data acquired, it can be seen that the distortion parameters under
water are greater than in air. This signifies that the distortion is enhanced significantly
when the IR camera is operated under water. The distortion parameters calculated can
then be incorporated into the Kinect Fusion code, which uses pre-coded calibration
parameters by default. By using these parameters, the effects of distortion that are
seen in the infrared and RGB cameras can be catered for while reconstructing the
scene.
Table 4.1: RGB camera calibration results in air and underwater
Parameter Name In air Underwater Focal length
(fx) 1032.66450 1947.3593 (fy) 1033.1741 1952.856
Principal Point
(qx) 972.3426 983.587 (qy) 532.6476 587.231
Distortion Coefficients
α 0.08708 1.05432 β -0.16515 -2.08232 χ -0.00321 0.01752 ζ -0.00345 0.016760
(a) (b)
Figure 4.3: RGB camera calibration results in air and under water (a) focal length
and principal axis values (b) distortion coefficients
0
500
1000
1500
2000
2500
fx fy qx qy
Inair Underwater
‐2.5
‐2
‐1.5
‐1
‐0.5
0
0.5
1
1.5
α β χ ζ
Inair underwater
99
Table 4.2: IR camera calibration results in air and underwater
Parameter Name In air Underwater Focal length
(fx) 391.096 ± 36.144 717.364895 ± 58.516 (fy) 463.098 ± 104.984 688.007761 ± 57.642
Principal Point
(qx) 243.892 ± 9.838 281.306232 ± 6.561 (qy) 208.922 ± 58.667 300.153537 ± 32.958
Distortion Coefficients
α 0.134547 1.580700 β -0.241541 -1.827172 χ -0.02839 0.204611 ζ -0.01516 -0.000325
(a) (b)
Figure 4.4: IR camera calibration results in air and under water (a) focal length and
principal axis values (b) distortion coefficients
The calibration values were tested by undistorting selected images other than
those for calibration. The results of the undistorted images are given in figure 4.5.
4.2.3 Effect of Colour and Material of Scanned Objects
Kinect works on the principle of NIR reflection from objects. If any object surface
absorbs or reflects the incoming ray in a different angle, then the device cannot get
any reading for that pixel.
0
100
200
300
400
500
600
700
800
fx fy qx qy
Inair Underwater
‐2
‐1.5
‐1
‐0.5
0
0.5
1
1.5
2
α β χ ζ
Inair underwater
100
Figure 4.5: IR image (a) original (b) undistorted using calibration parameter
This results in zero value pixels in the depth image acquired. The same was true for
objects being scanned under water, with some additional issues. Any object with a
black colour always showed zero-pixel values, irrespective of the material of the
object. This is possible because the strength of IR from the source inside Kinect is not
robust enough to get sufficient reflection from the parts. This observation was tested
on black rubber shoes, rusted metal pipes, black plastic objects and other materials.
Another observation while scanning was that rusted parts on metallic objects return
zero value whereas the remainder of the painted or non-rusted object resulted in a
depth value. Since corrosion estimation is not the focus of the research work, it leaves
an area that can be explored further in the future. Objects like small plants are hard to
reconstruct from the acquired data as the amount of noise in the depth data makes it
hard for the reconstruction to generate independent leaves etcetera. A summary of the
limitations of NIR sensing on different objects and materials listed in table 3.4 is
given below:
Table 4.3: Effect of colour and material of objects on underwater NIR scanning
Material Objects Scanning observations
Rubber Basketball, Safety shoe Black rubber parts do not reflect NIR
and return zero values. Coloured rubber parts have no negative effect
ABS plastic (painted)
House, Rubik’s Cube, Trophy stand, Decorative plant No observable negative effect on
point clouds Ceramic Coffee Mug Cement Cement Brick
101
4.3 Qualitative and Quantative Performance Evaluation
To evaluate the reconstructed 3D meshes, the performance criteria described in
section 3.3.6 was followed for the all meshes during the process. The qualitative
results without and with filtering have been described in the subsequent sections.
Intermediate results are also discussed to evaluate the performance of each of the
intermediate meshes generated after application of a filter.
Figure 4.6: Dense vs sparse point cloud under water
It is worth noting that Kinect fusion is designed to work on dense point clouds, as
it uses the entire depth frame for alignment. If the point cloud is sparse, then the
calculating the local minimum of each frame for alignment is not effective and the
tracking is lost quite frequently while scanning, as shown in figure 4.6. Due to the
effects of NIR absorption in water, significant depth data is lost and the point cloud
returned is sparse in nature.
An important consideration while reconstructing the meshes was found to be the
importance of the dense point cloud in the first frame of the mesh reconstruction.
Since ICP algorithm aligns the frames acquired in succession to the first frame of the
reconstruction, the quality of the first frame is very critical for the creation of a large
102
area mesh. Specifically, the first frame of the reconstruction should have dense point
cloud data, so the frames acquired can be aligned to it using ICP. If the first frame has
sparse point clouds, the mesh generated covers a much smaller area and should be
reset for better results. This can be done by using the ‘Reset Mesh’ button in the
software developed.
To quantitatively analyse the meshes, the meshes were compared to the ground
truth 3D models of objects. If the 3D models are not available, then a 3D scan of the
object (in open air) using Kinect was selected as the ground truth for the comparison.
The error calculation of the meshes is done using CloudCompare® a freely available
utility for 3D point cloud and triangular mesh processing. By aligning the ground truth
and the created mesh using ICP, the error in the distances after alignment is calculated
for every point. This can then be represented as an error heat map of the mesh.
Detailed discussion on the quantitative performance criteria has already been
discussed in section 3.3.6. The process followed for qualitatively and quantitatively
comparing 3D meshes generated in underwater environment can be defined by a
sequence of steps in a particular order as given in figure 4.7.
Figure 4.7: Steps for 3D mesh generation in underwater environment
In order to compare the results of all the intermediate steps, detailed comparison
of the results and each of the above steps from are presented in the sections below.
However, since the time of flight correction only adds a positive distance offset in the
entire mesh (towards the camera), therefore the results of ToF correction are already
incorporated within the results of refraction correction.
Meshfrom
KinectFusion
CameraCalibration
NoiseFiltering
TimeofFlightCorrection
RefractionCorrection
103
4.3.1 3D Reconstruction in Water by Unfiltered Kinect Fusion
The 3D meshes generated by the original and unaltered Kinect Fusion code, along
with the RGB images of the scene are given in figure. As can be seen from the images
in figure 4.8, while the general features of the scene are maintained, the meshes are
very noisy. Often, due to the noise in the data, Kinect Fusion process completely fails
to generate any mesh as the ICP algorithm fails to align the point clouds acquired.
This results in a completely noisy and garbage mesh.
Figure 4.8: (a) RGB image of scene (b) 3D reconstruction by Kinect Fusion only
(a) (b)
104
4.3.2 3D Reconstruction Results after Camera Calibration
Effect of camera calibration on the depth data is visible in Figure 4.9 where the tile
lines of the swimming pool were much straighter after incorporating camera
calibration parameters. The lens distortion correction coefficients calculated using the
GML camera calibration tool were found to be good approximations of the camera
parameters. The effect on other objects was not as visible distinguishable as on the
flat wall. There was also slight increase in the area of the mesh generated, since
camera calibration increases the area of point cloud matched by ICP.
Figure 4.9: (a) 3D reconstruction by Kinect Fusion only (b) Mesh after applying
camera calibration
(a) (b)
105
4.3.3 3D Reconstruction Results after Median Filtering
Once the mesh has been acquired in the water, noise filtering can be applied on the
mesh. Results of applying the mesh can be seen below in figure 4.10. It can be seen
that the additional unwanted noise generated by Kinect Fusion alone has been
significantly reduced. As a result of noise filtering, some sharpness in the mesh is also
reduced, but due to the overall cleaner point cloud data being passed on to ICP in
every frame, the alignment is easier, and the mesh generated are cleaner and better.
Note that the 3D mesh created are 1 voxel thick only and not generated as a solid, as
the mesh graphics are generated by ray-casting, a standard computer graphics scene
rendering technique.
Figure 4.10: Noise filtering results (a) results after camera calibration (b) mesh after
noise filtering
(a) (b)
106
4.3.4 3D Reconstruction Results after ToF and Refraction Corrections
The effects of time of flight and refraction correction on a mesh from the previous
stem (after applying noise filtering) can be seen below in figure. The objects are much
clearer around the edges and any aliasing effect on the 3D mesh are removed
efficiently, as visible in figure 4.11.
Figure 4.11: ToF and refraction correction results (a) mesh with median filter
(b) mesh after applying ToF and refraction correction
The effect of refraction correction can be seen much more clearly when a flat
surface is reconstructed, like the swimming pool wall shown previously. The pin-
cushion distortion that is generated, if the developed refraction correction filter is not
applied is quite visible, as shown in tables 4.4. This also leads to a poorly
reconstructed mesh as the alignment cannot be done if the camera moves over a
greater distance.
(a) (b)
107
Table 4.4: Front/side view of 3D reconstructed submerged swimming pool wall
3D reconstruction of submerged wall
Explanation
(a)
3D mesh of submerged wall created from
original Kinect Fusion algorithm. There
is significant amount of noise in the mesh.
Also, the centre portion is budged
significantly inwards, due to the
refraction effect, while the outer edges
are extended outwards.
(b)
3D mesh after applying camera
calibration. The centre buldge is much
flatter, the area of reconstruction inreases
slightly and the convex on the outer edges
significantly reduces.
(c)
3D mesh after applying median filter to
remove the salt and pepper noise in the
depth images while reconstructing. The
reconstruction quality is increased
significantly and noise is removed. The
significant bulge in the center also
becomes more clear.
(d)
3D mesh reconstructed after applying
camera calibration, smoothing, time of
flight and refraction correction filters.
The inward bulge in the centre is almost
gone and the error from a flat plane
reduces to
±5 mm error.
(Front) (Side)
(Front) (Side)
(Front) (Side)
(Front) (Side)
108
Figure 4.12: Alignment error maps of 3D reconstructed mesh of a submerged
swimming pool wall compared with an ideal plane, showing the refraction
correction results. green represents 0 mm error, red represents ≥ +20 mm error, blue
represents ≥ -20mm error. (From Left to right: Ideal reference plane, original Kinect
Fusion mesh, after camera calibration, after median filtering, ToF and refraction
corrected mesh)
To quantitatively analyse the effect of the proposed refraction correction
algorithm, alignment error maps of generated mesh were compared with the scanned
mesh with an ideal plane mesh as shown in Fig. 4.6 . Green colour in the heat map
represents 0 to ±5 mm error. Red represents ≥ +20 mm error whereas blue represents
≥ -20mm error. After applying median filter, quality of the mesh increased
significantly; the error decreased, and noise was removed. The significant inward
bulge in the centre (blue) due to refraction was reduced after applying refraction
correction. The mean error of the mesh after applying refraction correction was 1.3
mm with a standard deviation of 14 mm error. The slightly higher standard deviation
was due to the large errors along the edges and scattered mesh points as we move
away from the centre of the depth image. Results of RGB Image Mapped on 3D Mesh
109
Kinect Fusion allows for generating full-colour 3D meshes by mapping the
acquired RGB image on the mesh in real-time. The results can be exported in *.ply
format with all vertices and colour information. Some examples of RGB mapping on
the generated mesh are given in figure 4.13. After the data has been acquired and 3D
meshes have been generated, they can then be stitched together to form a continuous
3D mesh of the entire under water scene scanned by Kinect. Since the generation of
full-colour 3D reconstruction requires clear RGB images, there is a significant
dependency on the quality of RGB images and amount of light available in the
underwater scene. If the colour images acquired are clearer, due to the presence of
ambient light, then they can also be mapped on the reconstructed mesh. An overall
summary of the reconstructed results is given in table 4.5.
Figure 4.13: Results of RGB mapping on the generated 3D mesh (a) RGB image
acquired (b) 3D reconstructed scene (c) colour mapped mesh
110
Table 4.5: Additional object scan results in different conditions
Object Original scan in air or 3D model
Underwater scan from Kinect Fusion
code
Underwater scan with noise filtering, ToF and refraction
correction
Swimming pool wall air scan / 3D
model not available
(3D Model)
(3D Model)
(Aerial scan)
(Aerial scan)
111
Table 4.6: Error heat maps and gaussian distribution of error histogram of various
objects scanned underwater. Objects scanned are compared to original 3D CAD
model as well as with the 3D printed model scanned with KinectToF in the air
Description 3D Printed House 3D Printed Trophy
Stand 3D Printed Rubik’s
Cube
Ground truth
3D model of object
3D mesh scanned in
air
3D mesh scanned underwater
Underwater mesh vs 3D
model
Alignment error
heatmap
Error histogram statistics
(m)
Min 0.0 Min 0.0 Min 0.0 Max 0.00335 Max 0.10200 Max 0.03302 Avg. 0.00076 Avg. 0.00652 Avg. 0.00737 σ 0.00076 σ 0.00735 σ 0.00533
Underwater mesh vs air
scan
Alignment error
heatmap
Error
histogram
statistics
(m)
Min 0.0 Min 0.0 Min 0.0
Max 0.00335 Max 0.10929 Max 0.02655
Avg. 0.00076 Avg. 0.00033 Avg. 0.00430
σ 0.00076 σ 0.00684 σ 0.00399
112
Table 4.6 gives the alignment error heat maps and error histogram Gaussian
distribution of the several objects whose 3D model were available to be used as
ground truth. These objects were 3D printed and then scanned in air as well as
underwater. The corrected underwater mesh was then compared to the original 3D
model and the 3D mesh scanned in air, to generate an error histogram that is
represented as a heat map on the 3D mesh for visual clarity. As can be seen in the
results, we were able to achieve a mean error of ±6 mm with an average stand
deviation of 3 mm, thereby confirming that the reconstructed meshes were a close
approximation of the object being scanned. It is worth noting that Kinect Fusion is
designed to work on dense point clouds, as it uses the entire depth frame for alignment
using ICP. Because of NIR absorption and refraction in water, there was a significant
loss depth data and the point cloud returned was sparse in nature. If the point cloud is
sparse, then calculating the local minimum of each frame for alignment is difficult,
therefore the tracking was frequently lost while scanning. The importance of point
cloud in the first frame of the mesh reconstruction process was increased significantly
since ICP algorithm aligns the frames acquired in succession to the first frame of the
reconstructed scene.
As can be seen from the results in the results above, the meshes generated
underwater are significantly better than the ones generated by Kinect Fusion
algorithm without any additional filtering. This provides additional confidence that
the developed noise filtering and refraction correction are working as expected.
Without filtering option enabled, the mesh reconstructed is very noisy, with random
vertices being generated around the focused object or scene. It is often quite hard to
make out the target object in the scene if no filtering is used. After enabling filtering,
however, the results are quite satisfactory and the object is not only distinguishable
but also comparable to the original scans in air. While the reconstructed meshes are
not as sharp or clear as the original scans in air, the filters enable Kinect Fusion to
reconstruct meshes under water with much better accuracy compared to the original
air scans.
113
4.4 Comparison with Existing Methods
Even though RGB-D sensors appeared around 2007, there has been very little research
done for testing the performance of these cameras in underwater environment. Only
little research has been done on the various commercial depth cameras available and
only a few have actually been used fully immersed underwater, as discussed in section
0. A comprehensive list of the research done on testing RGB-D cameras underwater
has been given in table 2.2. At the time of writing of this thesis, no one has used the
KinectToF for underwater depth data acquisition before. The closest research that has
been done on data acquisition by fully submerged RGB-D sensors is by [14].
However, they have used the Intel RealSense sensor, which is a structured light sensor
similar to KinectSL. A comparison is given in table 4.7.
Table 4.7: Summary of comparison with similar work
Digumarti et al. [14] Proposed work
Sensor Intel RealSense KinectToF
Technology Structured Light Time of Flight
Fusion Technique InfiniTAM [87] Kinect Fusion [65]
Refraction correction Ray-casting based Ray-tracing based
Camera Calibration
Real-time (1 fps) (5 - 10 fps)
Scanning distance 200 mm 350 mm - 450 mm
Even though structured light uses NIR arrays to illuminate the scene in infrared
patterns, the methodology for calculating the per-pixel depth data is different as
compared to time of flight sensors. Furthermore, there is no data (point clouds, 3D
mesh or raw dataset) available for making a qualitative comparison of the output of
two methods. Therefore, a 1 to 1 comparison with their method is not possible.
However, a brief comparison of research work by Digumarti et al. [14] and our
proposed work is given in table 4.7.
114
We were able to achieve a frame rate of up to 10 fps on a system with Core-i7,
8GB Ram, Nvidia GTX, 765m graphics card and 256GB SSD storage. The increase
in mesh reconstruction performance as compared to the method proposed by
Digumarti et al. is due the change in refraction correction methodology to use a faster
ray tracing inspired method. A detailed performance analysis of the application was
performed in Visual Studio to find out the bottlenecks in the application that can be
removed or optimized to increase the performance of the code and achieve a higher
scene reconstruction frame rate. After analysing the performance proile, majority of
the computation power is taken up by the WPF framework and the data acquisition
from Kinect over USB 3.0. Secondly, out of all the implemented filters, refraction
correction and median filtering affect the performance more than the other filters and
camera calibration results. This is because even though the designed refraction
correction algorithm is efficient, it still consists of multiple trigonometric calculations
in every pass. If the code is multi-threaded and parallelized, the performance impact
of these filters can be significantly reduced.
4.5 Summary
In order to validate the proposed hypothesis of this research, the performance of
KinectToF in underwater environment was tested extensively. The sensor performed
reasonably good, by acquiring depth data at a reduced depth limit due to the
absorption of NIR in water. The point clouds acquired were analysed and the
developed time of flight and refraction correction techniques were tested on the
dataset acquired. The extensive data of RGB, IR and depth data was collected over
several experiments and in different water and lighting conditions. The algorithms
and performance have been evaluated qualitatively and quantitatively.
The overall 3D reconstruction of the objects and surfaces being scanned has been
found to be of acceptable quality. Visually, the reconstructed meshes resemble the
objects scanned. ICP registration errors were found to be low generally, with some
exceptions especially on the outlier points of the mesh which form some extended
vertices in 3D space. The refraction correction method developed provides real-time
115
performance and caters for distortions in the acquired data because of the different
media involved in the data acquisition. Lastly, since there is no work done earlier on
the KinectToF or any similar time of flight RGB-D sensor underwater, a
comprehensive quantitative analysis was not possible. However, a brief comparison
to the very few existing methods has been done.
116
CONCLUSION AND FUTURE WORK
In this thesis, a real-time underwater 3D scene reconstruction technique has been
proposed based on commercially available KinectToF sensor, which is a RGB-D time
of flight camera. The Kinect is widely used widely by roboticists and researchers for
3D scanning the normal open-air environments. In this research work, the same sensor
has been tested successfully to work underwater in an environment that it is not
designed for.
A special waterproof housing was designed that provided water ingress protection
without diminishing the performance or hindering operation of the sensor. The
complete hardware design intent and stress analysis simulations were done to evaluate
the validity of the design in higher pressures when it is submerged underwater. Kinect
was able to successfully acquire data at distances between 350 mm to 650 mm.
Camera calibration for both RGB and NIR camera was performed and a time of flight
correction methodology for adjusting ToF calculations to water was implemented. A
refraction correction technique inspired by standard ray-tracing techniques used in
computer graphics was also developed and tested sucessfully. Issues faced such as
noise and refraction correction and their counters were discussed at length.
Qualitative and quantitative results showed a mean error of ±6 mm with an average
stand deviation of 3 mm while scanning selected objects with varying material and
other properties. Comparison of the results achieved with a comparison to aerial
reconstruction versus underwater reconstruction is deliberated in detail and
characterized by comparing them with 3D CAD models and ground truth of scanned
meshes in air. A dataset of the scanned objects was developed and released publicly
for further research. Applications such as coral reef mapping and underwater SLAM
in shallow waters for a robotic solution based on ROV’s can be a viable application
area that can benefit from the results achieved by this research.
117
5.1 Contributions
Major contributions of this research work are summarized as follows:
An economical RGB-D camera has been shown to perform depth data
acquisition underwater, in a much harsher underwater environment than it was
designed for. Existing scene reconstruction algorithm Kinect Fusion has been
adapted to work underwater by pre-processing the data with noise filtering and
time of flight correction methods.
A dataset of Kinect 3D point cloud, RGB and IR data has been developed and
released publicly, including data of multiple objects with different
characteristics. The data set consist of underwater scanning experiments by
KinectToF in open air (ground truth) and in both clear and turbid underwater
setups, with varying levels of lighting.
A fast ray tracing based refraction correction technique has been developed
and implemented on acquired underwater point-cloud data. The methodology
has shown promising results by countering the effects of refraction on
underwater depth data by achieving a mean error of ±6 mm with an average
stand deviation of 3 mm.
A multi-part, waterproof, easy to assemble and 3D printable housing for
KinectToF has been developed that can be used for acquiring data underwater.
The design has also been released publicly.
5.2 Limitations and Future Work
The proposed research work adapts a sensor to work in a much harsher underwater
environment than it was designed for. Due to this, the depth measurement
performance of sensor has been reduced than its designed parameters. The distance
that the sensor can work is limited to small scale use only. Since the reconstruction
algorithms require dense point clouds for a high alignment and tracking performance,
the scanning process has to be carried out carefully. There is also some loss of detail
in the reconstructed meshes due to the noise faced in underwater environment. as the
original scene. However, since the results are promising for real-time small-scale 3D
118
reconstruction, there are several possibilities for further research and avenues for
improvement.
Since there is a scarcity of available underwater imaging datasets, the data
acquired for this research can be of valuable use for several other image processing
techniques as well as 3D reconstruction methods. Techniques like structure from
motion (SFM) and algorithms that are used for real-time colour correction and
contrast enhancement on underwater imaging data. can use the acquired dataset for a
multitude of purposes. As the refraction correction methodology is inspired from the
conventional ray-tracing techniques used in computer graphics, a parallelized
implementation can also be developed for significantly faster refraction correction
performance on real-time 3D scene reconstruction techniques.
Lastly, improved 3D scene reconstruction algorithms such as Kintinuous, that
improve on the original limitations of Kinect Fusion such as memory limit and scene
stitching for larger scene mesh can be used on the acquired data, including the
refraction correction technique developed for this research work to incorporate the
effects of refraction correction.
119
5.3 List of Publications
Journal Article
A. Anwer, S. S. A. Ali, A. Khan and F. Mériaudeau, " Underwater 3D Scene
Reconstruction Using Kinect v2 Based on Physical Models for Refraction and
Time of Flight Correction”, IEEE Access, 2017 (Q1, IF: 3.244)
Conference Proceedings
A. Anwer, S. S. A. Ali, F. Mériaudeau, "Underwater online 3D mapping and
scene reconstruction using low cost Kinect RGB-D sensor” 6th International
Conference in Intelligent and Advanced Systems (ICIAS), Malaysia, 2016
A. Anwer, S. S. A. Ali, A. Khan, F. Mériaudeau, "Real-time Underwater 3D
Scene Reconstruction Using Commercial Depth Sensor” in IEEE 6th
International Conference on Underwater System Technology: Theory and
Applications (USYS2016), Malaysia, 13-14 December 2016
A. Anwer, S. S. A. Ali, A. Khan, F. Mériaudeau, "Underwater 3D Scanning
Using Kinect V2 Time of Flight Camera” in 13th International Conference on
Quality Control by Artificial Vision (QCAV2017), Japan, 14-16 May 2017
A. Anwer, S. S. A. Ali, F. Mériaudeau, "Customized Graphical User Interface
Implementation of Kinect Fusion for Underwater Application” in IEEE 7th
International Conference on Underwater System Technology: Theory and
Applications (USYS2017), Malaysia, 18-20 December 2017
126
The software designed for this thesis is a customized implementation of the original Kinect Fusion,
available from Microsoft with the Kinect SDK 2.0. The GUI is comprised of a main window and two
sub-windows. The main window is split up in three parts, denoted by (1), (2) and (3) in the figure B.1.
The main panel (1) shows the real-time 3D scene reconstruction as Kinect is being moved in the 3D
space. The sub-panel (2) has three tabs, each showing the real-time RGB image, IR image and Delta
from reference image (more on this below). The third sub-panel (3) shows the real-time depth images
acquired.
Figure B.1: GUI of the developed software and Sub window launched from the main window
Options to control the various settings and filters are in the sub window that can be opened by
clicking the ‘Settings’ button on the main widnow. The voxel settings per axis as well as the voxel
density can be set by the sliders on the lower half of the sub windiow. If the tracking is lost during
reconstruction by Kinect Fusion, the mesh can be reset by clicking the Reset 3D mesh button (4). This
will resume the 3D mesh generation from the current frame acquired. Options to save the current RGB,
depth and IR frame is availble by clicking the ‘Save Screenshot’ button (9). This saves the three images
that are time-stamped to the ‘My Pictures’ folder by default. The generated 3D mesh can be saved by
clicking the ‘Save Point Cloud & 3D mesh’ button (10). The mesh can be saved in either *.stl, *.obj
and *.ply formats. Note that only ply format has the ability to save the mesh with the color image
mapped on top of it.
The ‘Delta from Reference Frame’ tab in the main window sub-pane (2) shows the data from the
camera tracking algorithm alignment results, color-coded by per-pixel. Values vary depending on
whether the pixel was a valid pixel used in tracking (green) or failed in different tests. It helps to
visualize how well each observed pixel aligns with the passed-in reference frame. Larger magnitude
127
values (either positive or negative) represent more discrepancy, and lower values represent less
discrepancy or less information at that pixel. The colour coding can be described by the table B.1.
Table B.1: Delta from reference frame colour coding
Value Description
White The input vertex was invalid and had no correspondences between the two point-
cloud images.
Green The outlier vertices were rejected (too large a distance between vertices)
Red The outlier vertices were rejected (too large a difference in normal angle between
the point clouds)
128
BIBLIOGRAPHY
[1] A. Hogue and M. Jenkin, “Development of an underwater vision sensor for 3D reef mapping,” in International Conference on Intelligent Robots and Systems (IROS), 2006, pp. 5351–5356.
[2] A. Khan, S. S. A. Ali, F. Meriaudeau, A. S. Malik, L. S. Soon, and T. N. Seng, “Visual feedback–based heading control of autonomous underwater vehicle for pipeline corrosion inspection,” International Journal of Advanced Robotic Systems, vol. 14, no. 3, p. 1729881416658171, 2017.
[3] “Fiskardo-Greece 2015 Underwater survey.” [Online]. Available: https://openexplorer.com/expedition/fiskardogreece2015. [Accessed: 02-Nov-2017].
[4] K. McClellan, “Evaluating the Effectiveness of Marine No-Take Reserves in St. Eustatius, Netherlands Antilles,” M.S. Thesis, Nicholas School of the Environment, Duke University, North Carolina, USA, 2009.
[5] “2G Robotics: Underwater Laser Scanners for high-resolution surveying.” [Online]. Available: http://www.2grobotics.com/. [Accessed: 03–01-2017].
[6] “Stanford’s Humanoid Diving Robot Takes on Undersea Archaeology and Coral Reefs.” [Online]. Available: http://spectrum.ieee.org/automaton/robotics/humanoids/stanford-ocean-one-humanoid-diving-robot. [Accessed: 03-Jan-2017].
[7] S. Krupinski, R. Desouche, N. Palomeras, G. Allibert, and M.-D. Hua, “Pool testing of AUV visual servoing for autonomous inspection,” IFAC-PapersOnLine, vol. 48, no. 2, pp. 274–280, 2015.
[8] A. Khan, S. S. A. Ali, A. S. Malik, A. Anwer, N. A. A. Hussain, and F. Meriaudeau, “Control of Autonomous Underwater Vehicle Based on Visual Feedback for Pipeline Inspection,” in Robotics and Manufacturing Automation (ROMA), 2016 2nd IEEE International Symposium on, 2016, pp. 1–5.
[9] J. Adams and J. Rönnby, “One of His Majesty’s ‘Beste Kraffwells’: the wreck of an early carvel-built ship at Franska Sternarna, Sweden,” International Journal of Nautical Archaeology, vol. 42, no. 1, pp. 103–117, 2013.
[10] A. Jaklič, M. Eric, I. Mihajlović, Z. Stopinšek, and F. Solina, “Volumetric models from 3D point clouds: The case study of sarcophagi cargo from a 2nd/3rd century AD Roman shipwreck near Sutivan on island Brač, Croatia,” Journal of Archaeological Science, vol. 62, pp. 143–152, 2015.
[11] S. M. Nornes, M. Ludvigsen, O. Odegard, and A. J. S.Orensen, “Underwater Photogrammetric Mapping of an Intact Standing Steel Wreck with ROV,” IFAC-PapersOnLine, vol. 48, no. 2, pp. 206–211, 2015.
[12] P. Ozog and R. M. Eustice, “Toward long-term, automated ship hull inspection with visual SLAM, explicit surface optimization, and generic graph-sparsification,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 3832–3839.
[13] “3D at Depth .” [Online]. Available: http://www.3datdepth.com/. [Accessed: 05-2016].
[14] S. T. Digumarti, G. Chaurasia, A. Taneja, R. Siegwart, A. Thomas, and P. Beardsley, “Underwater 3D Capture Using a Low-Cost Commercial Depth Camera,” in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, 2016, pp. 1–9.
[15] E. Lachat, H. Macher, T. Landes, and P. Grussenmeyer, “Assessment and Calibration of a RGB-D Camera (Kinect v2 Sensor) Towards a Potential Use for Close-Range 3D Modeling,” Remote
129
Sensing, vol. 7, no. 10, pp. 13070–13097, 2015.
[16] O. Sacks, Handbook of Chemistry and Physics. CRC press, 1999.
[17] G. M. Hale and M. R. Querry, “Optical constants of water in the 200-nm to 200-µm wavelength region,” Applied optics, vol. 12, no. 3, pp. 555–563, 1973.
[18] C. S. Inc., “Effects of Light Absorption and Scattering in Water Samples on OBS® Measurements,” no. 2Q-Q. 2008.
[19] “Colors at depth.” [Online]. Available: http://forums.watchuseek.com/f74/colors-depth-259540.html#post1889772. [Accessed: 25–12-2016].
[20] A. Khan, F. Meriaudeau, S. S. A. Ali, and A. S. Malik, “Underwater Image Enhancement and Dehazing Using Wavelet Based Fusion for Pipeline Corrosion Inspection,” in Intelligent and Advanced Systems (ICIAS), 2016 6th International Conference on, 2016, pp. 1–5.
[21] A. Khan, S. S. A. Ali, A. S. Malik, A. Anwer, and F. Meriaudeau, “Underwater image enhancement by wavelet based fusion,” in Underwater System Technology: Theory and Applications (USYS), IEEE International Conference on, 2016, pp. 83–88.
[22] A. Jordt, Underwater 3D Reconstruction Based on Physical Models for Refraction and Underwater Light Propagation, no. 2014/2. Department of Computer Science, CAU Kiel, 2014.
[23] R. Rӧttgers, D. McKee, and C. Utschig, “Temperature and salinity correction coefficients for light absorption by water in the visible to infrared spectral region,” Optics express, vol. 22, no. 21, pp. 25093–25108, 2014.
[24] M. D. Aykin and S. Negahdaripour, “On 3-D target reconstruction from multiple 2-D forward-scan sonar views,” in OCEANS, IEEE, 2015, pp. 1–10.
[25] I. Mandhouj, H. Amiri, F. Maussang, and B. Solaiman, “Sonar Image Processing for Underwater Object Detection Based on High Resolution System,” in SIDOP 2012: 2nd Workshop on Signal and Document Processing, 2012, vol. 845, pp. 5–10.
[26] X. C. J. S. N Hurtós, “Calibration of optical camera coupled to acoustic multibeam for underwater 3D scene reconstruction,” in OCEANS, IEEE, 2010, pp. 1–7.
[27] J. S. Jaffe, “Underwater Optical Imaging: The Past, the Present, and the Prospects,” IEEE Journal of Oceanic Engineering, vol. 3, no. 40, pp. 683–700, 2015.
[28] F. Bruno, G. Bianco, M. Muzzupappa, S. Barone, and A. Razionale, “Experimentation of structured light and stereo vision for underwater 3D reconstruction,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 66, no. 4, pp. 508–518, 2011.
[29] A. Sarafraz and B. K. Haus, “A structured light method for underwater surface reconstruction,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 114, pp. 40–52, 2016.
[30] F. Järemo Lawin, “Depth Data Processing and 3D Reconstruction Using the Kinect v2,” M.S. Thesis, Department of Electrical Engineering, Linköping University, Linköping, Sweden, 2015.
[31] L. K. Rumbaugh, E. M. Bollt, W. D. Jemison, and Y. Li, “A 532 nm chaotic Lidar transmitter for high resolution underwater ranging and imaging,” in Oceans-San Diego, 2013, 2013, pp. 1–6.
[32] M. Hammond, A. Clark, A. Mahajan, S. Sharma, and S. Rock, “Automated Point Cloud Correspondence Detection for Underwater Mapping Using AUVS,” in OCEANS’15 MTS/IEEE Washington, 2015, pp. 1–7.
130
[33] F. Santoso, M. A. Garratt, M. R. Pickering, and M. Asikuzzaman, “3D Mapping for Visualization 0f Rigid Structures: A Review and Cemparative Study,” IEEE Sensors Journal, vol. 16, no. 6, pp. 1484–1507.
[34] I. Rekleitis, J.-L. Bedwani, E. Dupuis, T. Lamarche, and P. Allard, “Autonomous over-the-horizon navigation using LIDAR data,” Autonomous Robots, vol. 34, no. 1–2, pp. 1–18, 2013.
[35] D. McLeod, J. Jacobson, M. Hardy, and C. Embry, “Autonomous inspection using an underwater 3D LiDAR,” in Oceans-San Diego, 2013.
[36] C. Cain and A. Leonessa, “Laser based rangefinder for underwater applications,” in American Control Conference (ACC), 2012, 2012, pp. 6190–6195.
[37] Z. Xie and Y. Wang, “Attenuation property analysis of lidar transmission in seawater,” in Measurement, Information and Control (MIC), 2012 International Conference on, 2012, vol. 2, pp. 1011–1014.
[38] B. Freedman, A. Shpunt, M. Machline, and Y. Arieli, “Depth Mapping Using Projected Patterns,” US Patent 8 50142, 13 May 2010.
[39] A. Shpunt, “Depth mapping using multi-beam illumination,” US Patent 8 350 847, 28 Jan 2010.
[40] J. Garcia and Zalevsky, “Range Mapping Using Speckle Decorrelation,” US Patent 7 433 024, 7 Oct 2008.
[41] J. Sell and P. O’Connor, “The xbox one system on a chip and kinect sensor,” IEEE Micro, vol. 34, no. 2, pp. 44–53, 2014.
[42] C. Hertzberg and U. Frese, “Detailed modeling and calibration of a time-of-flight camera,” in Informatics in Control, Automation and Robotics (ICINCO), 2014 11th International Conference on, 2014, vol. 1, pp. 568–579.
[43] K. Khoshelham, “Accuracy analysis of kinect depth data,” in ISPRS workshop laser scanning, 2011, vol. 38, no. 5, pp. 12–18.
[44] K. Shifrin, Physical optics of ocean water. American Institute of Physics, New York, 1988.
[45] J. A. Curcio and C. C. Petty, “The near infrared absorption spectrum of liquid water,” JOSA, vol. 41, no. 5, pp. 302–304, 1951.
[46] C.-L. Tsui, D. Schipf, K.-R. Lin, J. Leang, F.-J. Hsieh, and W.-C. Wang, “Using a Time of Flight method for underwater 3-dimensional depth measurements and point cloud imaging,” in OCEANS, IEEE, 2014, pp. 1–6.
[47] A. Dancu, M. Fourgeaud, Z. Franjcic, and R. Avetisyan, “Underwater reconstruction using depth sensors,” in SIGGRAPH Asia 2014 Technical Briefs, 2014, p. 2.
[48] T. Butkiewicz, “Low-cost coastal mapping using Kinect v2 time-of-flight cameras,” in Oceans-St. John’s, 2014, 2014, pp. 1–9.
[49] H. Lu, Y. Zhang, Y. Li, Q. Zhou, R. Tadoh, T. Uemura, H. Kim, and S. Serikawa, “Depth Map Reconstruction for Underwater Kinect Camera Using Inpainting and Local Image Mode Filtering,” IEEE Access, 2017.
[50] H. Sarbolandi, D. Lefloch, and A. Kolb, “Kinect range sensing: Structured-light versus Time-of-Flight Kinect,” Computer Vision and Image Understanding, vol. 139, pp. 1–20, 2015.
131
[51] B. Langmann, K. Hartmann, and O. Loffeld, “Depth Camera Technology Comparison and Performance Evaluation.,” in ICPRAM (2), 2012, pp. 438–444.
[52] “Xbox 360 Kinect Teardown.” [Online]. Available: https://www.ifixit.com/Teardown/Xbox+360+Kinect+Teardown/4066. [Accessed: 24–03-2016].
[53] H. Gonzalez-Jorge, P. Rodriguez-Gonzálvez, J. Martinez-Sánchez, D. González-Aguilera, P. Arias, M. Gesto, and L. Diaz-Vilariño, “Metrological Comparison Between Kinect I and Kinect II Sensors,” Measurement, vol. 70, pp. 21–26, 2015.
[54] “Xbox One Kinect Teardown.” [Online]. Available: https://www.ifixit.com/Teardown/Xbox+One+Kinect+Teardown/19725.
[55] L. Shao, J. Han, P. Kohli, and Z. Zhang, Computer vision and machine learning with RGB-D sensors. Springer, 2014.
[56] M. Gesto Diaz, F. Tombari, P. Rodriguez-Gonzalvez, and D. Gonzalez-Aguilera, “Analysis and Evaluation Between the First and the Second Generation of RGB-D Sensors,” Sensors Journal, IEEE, vol. 15, no. 11, pp. 6507–6516, 2015.
[57] M. Bueno, L. Diaz-Vilariño, J. Martinez-Sánchez, H. González-Jorge, H. Lorenzo, and P. Arias, “Metrological evaluation of KinectFusion and its comparison with Microsoft Kinect sensor,” Measurement, vol. 73, pp. 137–145, 2015.
[58] C. D. Mutto, P. Zanuttigh, and G. M. Cortelazzo, Time-of-flight Cameras and Microsoft Kinect (TM). Springer Publishing Company, Incorporated, 2012.
[59] H. Gonzalez-Jorge, B. Riveiro, E. Vazquez-Fernandez, J. Martinez-Sánchez, and P. Arias, “Metrological evaluation of Microsoft Kinect and Asus Xtion sensors,” Measurement, vol. 46, no. 6, pp. 1800–1806, 2013.
[60] N. M. DiFilippo and M. K. Jouaneh, “Characterization of different Microsoft Kinect sensor models,” Sensors Journal, IEEE, vol. 15, no. 8, pp. 4554–4564, 2015.
[61] M. Andersen, T. Jensen, P. Lisouski, A. Mortensen, M. Hansen, T. Gregersen, and P. Ahrendt, “Kinect Depth Sensor Evaluation for Computer Vision Applications,” Technical Report Electronics and Computer Engineering, vol. 1, no. 6, 2015.
[62] A. Corti, S. Giancola, G. Mainetti, and R. Sala, “A metrological characterization of the Kinect V2 time-of-flight camera,” Robotics and Autonomous Systems, vol. 75, pp. 584–594, 2016.
[63] E. Lachat, H. Macher, M. Mittet, T. Landes, and P. Grussenmeyer, “First experiences with kinect v2 sensor for close range 3d modelling,” The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 40, no. 5, p. 93, 2015.
[64] A. Maykol Pinto, P. Costa, A. P. Moreira, L. F. Rocha, G. Veiga, and E. Moreira, “Evaluation of Depth Sensors for Robotic Applications,” in Autonomous Robot Systems and Competitions (ICARSC), 2015 IEEE International Conference on, 2015, pp. 139–143.
[65] R. A. Newcombe, A. J. Davison, S. Izadi, P. Kohli, O. Hilliges, J. Shotton, D. Molyneaux, S. Hodges, D. Kim, and A. Fitzgibbon, “KinectFusion: Real-time dense surface mapping and tracking,” in Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, 2011, pp. 127–136.
[66] “Using Kinfu Large Scale to generate a textured mesh.” [Online]. Available: http://pointclouds.org/documentation/tutorials/using_kinfu_large_scale.php. [Accessed: 12-3/17].
132
[67] T. Whelan, J. Mcdonald, M. Kaess, M. Fallon, H. Johannsson, and J. J. Leonard, “Kintinuous: Spatially Extended KinectFusion,” in 3rd RSS Workshop on RGB-D: Advanced Reasoning with Depth Cameras, 2012.
[68] M. Solony, “Scene reconstruction from kinect motion,” in Proceeding of the 17th conference and competition student EEICT, 2011.
[69] K. Jahrmann, “3D Reconstruction with the Kinect-Camera,” M.S. thesis, Faculty of Informatics., Vienna University of Technology, Vienna, Austria, 2013.
[70] F. Endres, J. Hess, N. Engelhard, J. Sturm, D. Cremers, and W. Burgard, “An evaluation of the RGB-D SLAM system,” in Robotics and Automation (ICRA), 2012 IEEE International Conference on, 2012, pp. 1691–1696.
[71] A. Handa, T. Whelan, J. McDonald, and A. J. Davison, “A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM,” in Robotics and Automation (ICRA), 2014 IEEE International Conference on, 2014, pp. 1524–1531.
[72] S. Kim and J. Kim, “Occupancy mapping and surface reconstruction using local gaussian processes with kinect sensors,” IEEE TRANSACTIONS ON CYBERNETICS, vol. 43, no. 5, pp. 1335–1346, 2013.
[73] Z. Zhu and S. Donia, “Spatial and visual data fusion for capturing, retrieval, and modeling of as-built building geometry and features,” Visualization in Engineering, vol. 1, no. 1, pp. 1–10, 2013.
[74] “Optical & Transmission Characteristics - Plexiglass.com.” [Online]. Available: http://www.plexiglas.com/export/sites/plexiglas/.content/medias/downloads/sheet-docs/plexiglas-optical-and-transmission-characteristics.pdf. [Accessed: 24–03-2016].
[75] M. Bodmer, N. Phan, M. Gold, D. Loomba, J. Matthews, and K. Rielage, “Measurement of optical attenuation in acrylic light guides for a dark matter detector,” Journal of Instrumentation, vol. 9, no. 02, p. P02002, 2014.
[76] “Camera Calibration and 3D Reconstruction.” [Online]. Available: http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html. [Accessed: 26–03-2017].
[77] “Correcting Barrel Distortion.” [Online]. Available: http://www.panotools.org/dersch/barrel/barrel.html. [Accessed: 03-Jan-2017].
[78] “GML C++ Camera Calibration Toolbox.” [Online]. Available: http://graphics.cs.msu.ru/en/node/909. [Accessed: 10–02-2017].
[79] Q. Zhang, L. Xu, and J. Jia, “100+ times faster weighted median filter (WMF),” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2830–2837.
[80] T. Huang, G. Yang, and G. Tang, “A fast two-dimensional median filtering algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 1, pp. 13–18, 1979.
[81] S. Perreault and P. Hébert, “Median filtering in constant time.,” IEEE Trans Image Process, vol. 16, no. 9, pp. 2389–94, 2007.
[82] A. Sobiecki, H. C. Yasan, A. C. Jalba, and A. C. Telea, “Qualitative comparison of contraction-based curve skeletonization methods,” in International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing, 2013, pp. 425–439.
133
[83] P. Cignoni, C. Rocchini, and R. Scopigno, “Metro: measuring error on simplified surfaces,” in Computer Graphics Forum, 1998, vol. 17, no. 2, pp. 167–174.
[84] “CloudCompare: 3D point cloud and mesh processing software” [Online]. Available: http://www.danielgm.net/cc/. [Accessed: 21–01-2017].
[85] P. A. Laplante, “Real-time imaging,” IEEE Potentials, vol. 23, no. 5, pp. 8–10, 2004.
[86] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. Newcombe, P. Kohli, J. Shotton, S. Hodges, D. Freeman, A. Davison, and others, “KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera,” in Proceedings of the 24th annual ACM symposium on User interface software and technology, 2011, pp. 559–568.
[87] O. Kahler, V. ~A. Prisacariu, C. ~Y. Ren, X. Sun, P. ~H. ~S Torr, and D. ~W. Murray, “Very High Frame Rate Volumetric Integration of Depth Images on Mobile Device,” IEEE Transactions on Visualization and Computer Graphics (Proceedings International Symposium on Mixed and Augmented Reality 2015, vol. 22, no. 11, 2015.