vision based non-invasive tool for facial swelling assessment
TRANSCRIPT
JAYAMAHA ET AL: VISION BASED NON-INVASIVE TOOL – PNCTM; VOL. 2, JAN 2013
125
Vision Based Non-Invasive Tool for Facial
Swelling Assessment N.A.Jayamaha, P.C. Amarasingha,K.Thanansan, S. Ajanthan, and C. D. Silva
Abstract — An algorithm to quantify the swelling by reconstructing 3D model of the face with stereo images is presented. We
analyzed the primary problems in computational stereo, which include correspondence and depth calculation. Work has been
carried out to determine suitable methods for depth estimation and standardizing volume estimations. Finally we designed
software for reconstructing 3D images from 2D stereo images, which was built on Matlab and Visual C++. Utilizing
techniques from multi-view geometry, a 3D model of the face was constructed and refined. An explicit analysis of the stereo
disparity calculation methods and filter elimination disparity estimation for increasing reliability of the disparity map was
used. Minimizing variability in position by using more precise positioning techniques and resources will increase the accuracy
of this technique and is a focus for future work.
Keywords — Calibration, Rectification, Disparity map, Depth map, 3D Reconstruction
I. INTRODUCTION
One major problem in surgery planning and diagnosis
is the bundle of manual work like taking facial and skull
parameters, modelling parts of the skull using Plaster of
Paris etc. There is software available for this purpose
which is quite expensive. They require high end
technologies akin to CT (Computer Topographic) or MRI
(Magnetic Resonance Images) scanners as input devices.
Moreover almost all these software are built using
parametric values defined for either Europeans or
Americans.The proposed system has many interesting
applications in surgical field. Main aim is to quantify the
Swelling following Facial Surgery. This could well be
extended to surgery performed in other parts of the body.
The main intention of this project is to develop a system
which could be used to quantify thefacial swelling which
could provideuseful information to its users (Doctors) to
monitor post-surgical swelling in a more simple, accurate
and objective manner. Further it could become a useful
surgical research tool for use in comparative evaluation of
different surgical techniques for efficacy and safety.
3D reconstruction from stereo images is a difficult task;
sensitivity to light, shadow & other changes on images
causes major effect on the result. Strong algorithms and
filters have to be used to raise accuracy and performance
of the system
The system identifies pixel details of both stereo
imagesand provides the Depth map. The depth and
disparity plot gives a pretty fair idea of the relative
positions of the various objects in the images; but to
improve the user‘s understandability we try to regain the
lost characteristics of the image by warping the intensity
colour values of the image on the disparity and plotting it
on a 3D view.
D. N. A. Jayamaha, P. C. Amarasingha, K. Thanansan, S.
Ajanthan and C. DeSilva are with the Computer Science &
Engineering Department, University of Moratuwa, Sri Lanka
(phone: 071-421-4340; e-mail: [email protected],
[email protected], [email protected],
[email protected], [email protected]).
II. GENERAL 3D RECONSTRUCTION APPROACHES
For the past decade the majority of 3D reconstruction
research has been focused on recognition from single
frame, frontal view, and 2D face images of the subject.
Whilst there has been significant success in this area
using techniques such as Eigen faces and elastic bunch
graph matching several issues look set to remain unsolved
by such approaches. These issues include the current set
of algorithms inability to robustly deal with large changes
in head pose and illumination.
In recent years, a mixture of non-contact, optically-
based 3D data acquisition techniques have been
developed that can be applied to the imaging of humans.
A wide variety of commercial and non-commercial off
the-shelf devices for 3D optical sensing are available that
can be categorized as follows:
A. Morphable Modelling
Huang, Blanz and Heisele propose a 3D recognition
solution which utilizes a morphable 3D head model to
synthesize training images under a variety of conditions
[1]. The main idea behind the solution is that given a
sufficiently large database of 3D face models any
arbitrary face can be generated by morphing models
already in the database. In the recognition stage of their
work a component based face recognition system is used.
B. Euclidean Transformation
Based on available 3D data aclassical approach to this
problem usually attempt to find a Euclidean
transformation which maximizes a given shape similarity
measure. Irfanoglu, Gokberk and Akarun [2] use a
discrete approximation of the volume differences between
facial surfaces as their Euclidean similarity measure. In
contrast Bronstein, Bronstein and Kimmel [3] propose an
alternative to this solution where they choose an internal
face representation which is invariant to isometric
distortions.
JAYAMAHA ET AL: VISION BASED NON-INVASIVE TOOL – PNCTM; VOL. 2, JAN 2013
126
Invariance to isometric distortions allows the
recognition system to be highly tolerant to changes in
expression; this is in contrast to classical techniques
which are more suited for matching rigid objects due to
the nature of the Euclidean transformations most often
used.
C. Time of Flight Radar
Time-of-flight approaches include optical, sonar, and
microwave radar which, typically calculate distances to
objects by measuring the time required for a pulse of light,
sound, or microwave energy to return from an object [4].
Good results are obtained for large objects. For smaller
objects however, a high speed timing circuitry is required
to measure the time-of-flight, since the time differences to
be detected are in the 10-12
seconds range for about 1mm
accuracy. Unfortunately, making direct measurements of
time intervals with less than 10 Pico seconds accuracy
(this is 1/8 inch) remains relatively expensive.
D. Laser Scanning Triangulation
One of the most accepted 3D data acquisition
techniques that have been successfully applied to object
surface measurement is laser scanning triangulation. The
technique involves projecting a stripe of laser light onto
the object of interest and viewing it from an offset camera.
Deformations in the image of the light stripe correspond
to the topography of the object under the stripe which is
measured.
Many commercial products for 3D acquisition have
been released employing laser scanning triangulation
methods. Cyber ware [6] developed 3D scanners based on
this technology which have been used by the movie
industry to create special effects.
E. Coded Structured Light
Coded structuredlight systems are based on projecting a
light pattern instead of a single stripe and imaging the
illuminated scene from one or more viewpoints [5]. This
eliminates the need for scanning across the surface of the
objects associated with laser scanners.
The objects in the scene contain a certain coded
structured pattern that allows a set of pixels to be easily
distinguishable by means of a local coding strategy. The
3D shape of the scene can be reconstructed from the
decoded image points by applying triangulation. Most of
the existing systems project a stripe pattern, since it is
easy to recognize and sample.
III. STEREO VISION
The most obvious technique for 3D construction is
using stereo vision. Stereo imaging is imaging with two
cameras and finding depth of each of the image points
from camera analogues to human vision system.
Computers accomplish this task by finding
correspondences between points that are seen by one
imager and the same points as seen by the other imager.
With such correspondences and a known baseline
separation between cameras, the 3D location of the points
can be computed. Although the search for the
corresponding search is computationally expensive the
search can be narrow downed by the process called
rectification.
In practice, stereo imaging involves four steps when using
two cameras [6].
Mathematically remove radial and tangential lens
distortion. This is called undistortion. The outputs of
this step are undistorted images.
Adjust for the angles and distances between cameras,
a process called rectification. The outputs of this step
are images that are row-aligned (means a point in one
image will be in the same row of the second image)
and rectified.
Find the same features in the left and right camera
views, a process known as correspondence. The
output of this step is a disparitymap, where the
disparities are the differences in x-coordinates on the
image planes(left and right images) of the same
feature viewed in the left and right cameras: (xl – xr)
If we know the geometric arrangement of the
cameras, then we can turn the disparity map into
distances by triangulation. This step is called
reprojection, and the output is a depth map.
A. Stereo Calibration
To accomplish undistortion, first we have to calibrate
the camera & find parameters of the camera. For the
stereo calibration a set of chessboard images are taken
simultaneously using both cameras.In order to find out the
position of any corner we only need to know how many
horizontal and vertical squares there are in the chessboard
and the size of a square. The chessboard in the image is a
9x6 chessboard and if we print it in a paper of size A4 the
size of the squares would be more or less 2.5cm.
Following OpenCV functions are used for finding and
drawing the corners of the chessboard images,
cvFindChessboardCorners( image, board_sz,
corners,&corner_count,
CV_CALIB_CB_ADAPTIVE_THRESH |
CV_CALIB_CB_FILTER_QUADS );
Figure 1 : Founded Corners in Stereo Calibration
JAYAMAHA ET AL: VISION BASED NON-INVASIVE TOOL – PNCTM; VOL. 2, JAN 2013
127
B. Stereo Rectification
It is easiest to compute the stereo disparity when the
two image planes align exactly. Unfortunately, in real
world application, a perfectly aligned configuration is rare
with a real stereo system, since the two cameras almost
never have exactly coplanar, row-aligned imaging planes.
There are many ways to compute our rectification terms,
of which OpenCV implements.
(1) Hartley’s algorithm, [Hartley98], which can yield
uncalibrated stereo using just the fundamental matrix [6].
(2) Bouguet’s algorithm, which uses the rotation and
translation parameters from two calibrated cameras [6].
C. Finding Correspondence
Image points are matched across stereo image pairs and
then reconstructed to three dimensions. The most
common class of correspondence measures are pixel
based algorithms [7, 8] which compare similarity between
pixels across images in order to deduce likely matching
image points. The problem of matching 2D camera
projections of real world image points across stereo image
pairs leads to a host of additional issues including input
point selection and ―good‖ match selection. Keller
conducts a comprehensive evaluation of matching
algorithms and match quality measures in [9]. Additional
work that contains a comprehensive evaluation of a large
number of correspondence algorithms can be found in
[10].
A number of solutions to the stereo correlation problem
have been proposed that operate on the camera input in
the frequency domain. Frequency domain approaches are
typically attractive because of their processing speed and
inherent sub-pixel accuracy [11]. Following Figure
isRepresentation of Stereo Projection.
Matching a 3D point in the two different camera viewscan
be computed only over the visual areas in which the views
of the two cameras overlap.
In order to maximize the overlap we have already
arranged our cameras to be nearly frontal parallel as
possible.
We can calculate the disparity as follows,
d = xl– xror d = xl – xr– (cx left – cx right) (1)
if the principal rays intersect at a finite distance.(In our
case this is true)
xl - x co-ordinate of a point in the left image
xr - x co-ordinate of the corresponding point in the right
image.
Points in two dimensions can also then reprojected into
three dimensions given their screen (image) coordinates,
the disparity and the camera intrinsic matrix.
The reprojection matrix Q is:
[
]
(2)
Here the parameters are from the left image except for cx‘,
which is the principal point x coordinate in the right
image. Tx is Translation of two cameras on x direction.
Given a two-dimensional homogeneous point (x,y) of the
left image and its associated disparity d, we can project
the point into three dimensions using the following matrix
equation,
[
] [
] (3)
The 3D coordinates are then (X/W, Y/W, Z/W).
Instead of using Q directly, the real world co-ordinates
(X,Y,Z) can be computed using following equations(same
as using Q matrix),
(4)
Where,
: Disparity of the particular pixel
: Left image x co-ordinate of the particular pixel
: Left image y co-ordinate of the particular pixel
: Translation of left camera with respect to right
camera in x direction
: Focal length in pixel of the left camera after
rectification
Figure 2: Stereo Projection
Figure 2 : Original Image & Disparity Map
JAYAMAHA ET AL: VISION BASED NON-INVASIVE TOOL – PNCTM; VOL. 2, JAN 2013
128
X, Y, Z values are the coordinates of a corner in a
particular unit given during the calibration during the
calibration the length of the chessboard square can be
given in any unit, we selected cm as the unit (It is 2.2 cm).
1) Extended SAD
In this method, we used block-matching technique in
order to construct a 3D array for every disparity
calculation. After iterative application of averaging
filtering for each disparity, we selected the disparity ( d),
which has minimum disparity (i, j, d) as the most reliable
disparity estimation for pixel (i, j) of disparity map.
∑ ∑ ∑
(5)
Step 1: For every disparity d in disparity search range,
calculate 3D array for every Window.
Step 2: Apply average filtering iteratively to every 3D
array calculated for a disparity value in the range of
disparity search range.
Step 3: For every (i, j) pixel, find the minimum disparity
(i, j, d), assign its disparity index (d) to d(i, j) which is
called disparity map.
∑ ∑
(6)
For window size, Averaging filtering value
is . Averaging filter (linear filter) removes very
sharp change in energy which possibly belongs to
incorrect matching.
This function invoke the parameters from input images
and user can manually set the maximum disparity value
and window size.
function [spdmap, dcost, pcost, wcost]
= stereomatch(imgleft, imgright,
windowsize, disparity, spacc)
Set the corresponding Parameters for calculation. WS =
uint16(windowsize); %
Set window size
WS2 = uint16( ( WS - 1 ) /
2 ); % Half window
D =
uint16(disparity)+1; %
number of disparities
Then Initialize necessary parameters for Disparity
calculation pcost = zeros( heightL, widthL, D,
'uint8' );
wcost = zeros( heightL, widthL, D,
'single' );
dmap = zeros( heightL, widthL,
'uint8' );
dcost = zeros( heightL, widthL,
'single' );
h = zeros(WS,WS,'double'); h(1,1) = 1;
h(1,WS) = -1; h(WS,1) = -1; h(WS,WS) =
1;
Now calculating the pixel cost. for Dc = 1 : D
maxL = widthL + 1 - Dc;
pcost(:, Dc : widthL, Dc ) =
imabsdiff( imgright( :, 1 : maxL),
imgleft( :, Dc : widthL) );
end
Dc = 1 to D, Where D is maximum disparity from input
maxL = widthL+1-Dc
The system shows the resulting volume of the facial
area with respect to the polygon created by threeuser-
specified biometric points from navigating cursor points.
To find out the volume in the swelling we can subtract the
system produce volume values on the corresponding days.
The system calculated volume is measured in cm3.
2) Depth from Disparity
In order to find depth a calibration is done with the help
of disparity values generated by the above method. A
graph is drawn by regression analysis for known depth
and disparities.
Figure 4 :Final Release Figure 4 : Final Release
Figure 3 : Left & Right image inputs to draw graph
JAYAMAHA ET AL: VISION BASED NON-INVASIVE TOOL – PNCTM; VOL. 2, JAN 2013
129
3) Results
Thus, using the mapped depth estimation the swelling
can be defined as
∑
(7)
(p=points inside the specified area of interest,db= depth
before swelling, df=depth after swelling, calculated pixel
area)
For experimental purposes we improvised facial swellings
by asking our participants to keep two ‗Alpenliebe‘
toffees inside.
TABLE I
CONSIDERED GEOMETRIC POINTS
Considered Geometric Points
Tip of the ear,
top of the nose
and corner of lips
Tip of the ear,
top of the nose
and tip of the
chin
System volume for a
swelling improvised by
two ‗Alpenliebe‘
toffees(measured by a
system specific unit for
volume)
2.38e+66 1.28+70
Standardised volume
for a swelling
improvised by two
‗Alpenliebe‘ toffees
30cm3 80cm3
Actual volume of two ‗Alpenliebe‘ toffees
approximately is 7.5 cm3. Yet the volume it reflects in
reality as a swelling is different from the 7.5cm3because
of the elasticity and surface resilience of skin.
IV. CONCLUSIONS
Concluding the results we observed, the differences
between the approximated swelling volume and the value
estimated from our system could be due to various
reasons.
We have used the same camera to take both pictures
since no two cameras have exact intrinsic parameters and
to avoid calibration. Then under practical assumptions
rectification was achieved bycarefully moving the camera
horizontally when taking pictures. System resolution has
to be kept at large for points have to be marked accurately
each time. Due to limited processing power of available
computers original pictures taken from DSLR camera
(canon550D 18mp,18-55mm lens) had to be compressed.
Thus, a data loss is incurred and pixel calibration is
affected. Also it should be highlighted the need of light
invariant environment because of the high sensitivity
factor of light in stereo vision because approximately the
same lighting environment should be kept at different
visits of a patient to the clinic. Yet the system can be used
for volume comparisons that are what is ultimately
needed, to observe reduction of the swelling. But for
standardising results for exact units is problematic unless
conditions are provided as needed.
However provided the required conditions the
processing ability of the eMedica system based on the
proposed architectural framework opens up its
applicability to a wide range of applications not only for
facial swelling but other volume calculation application
also, where the proposed framework can be customized
based on specific performance requirements in speed of
processing, etc.
Here we have tried to implement a totally new idea and
the eMedica team has brought this idea into a deployable
product level, but more improvements canbe added. We
had to meet the time constraints and
resources/technical/medical knowledge constraints. More
research and more effort will make this product a better
one and it will help lot of dental surgery and dental clinics
as well. So we hope that this tool will be improved in the
future and to release the new software as a commercial
product.
ACKNOWLEDGMENT
eMedica team wishes to acknowledge the Dep.of CSE
of University of Moratuwa,includingDr.Chandana
Gamage and Dr.MalakaWalpola. Our dutiful gratitudes
also belong to Doctor Harsha de Silva, Senior Lecturer-
University of Otago, Consultant OMF Surgeon and
Associate Prof Rohan De Silva for proposing the project
idea and providing basic medical knowledge we needed
on the subject.
REFERENCES
[1] J. Huang, V.Blanz and B.Heisele, ‖Face Recognition with
Support Vector Machines and 3D Head Models,‖ 2002.
[2] B. Gokberk, M. O. Irfano˘glu and L. Akarun
―Representation plurality and fusion for 3D face
recognition,‖ 2006.
[3] A. M. Bronstein, M.M.Bronstein and R. Kimmel,
Expression-Invariant 3D Face Recognition. 2003.
Figure 5 : Left & Right image inputs to draw graph
JAYAMAHA ET AL: VISION BASED NON-INVASIVE TOOL – PNCTM; VOL. 2, JAN 2013
130
[4] (2012) Time-of-flight camera. [Online]. Available:
http://en.wikipedia. org/wiki/Time-of-flight_camera
[5] ]. M. Young, E. Beeson, J. Davis, S. Rusinkiewicz and R.
Ramamoorthi, ―Viewpoint-Coded Structured Light‖.
[6] ] G. Bradski and A. Kaehler, Learning OpenCV Computer
Vision with the OpenCV library, 1st ed., USA, O‘REILLY,
2008, ch. 12.
[7] J. Kim, V. Kolmogorov, and R. Zabih, ―Visual
Correspondence Using Energy Minimization and Mutual
Information,‖ 2003.
[8] S.O. Chan., Y.P. Wong, and J.K. Daniel, ―Dense Stereo
Correspondence Based on Recursive Adaptive Size Multi-
Windowing,‖ 2000.
[9] M. G. Keller, ―Matching Algorithms and Feature Match
Quality Measures For Model Based Object Recognition
with Applications to Automatic Target Recognition,‖ 1999.
[10] Scharstein andR.Szeliski, ―A Taxonomy and Evaluation of
Dense Two-Frame Stereo Correspondence Algorithms,‖
2001.
[11] U. Ahlvers and U. Zoelzer, ―Inclusion of Magnitude
Information for Improved Phase-Based Disparity
Estimation in Stereoscopic Image Pairs,‖2005