vision based non-invasive tool for facial swelling assessment

JAYAMAHA ET AL: VISION BASED NON-INVASIVE TOOL – PNCTM; VOL. 2, JAN 2013

125

Vision Based Non-Invasive Tool for Facial

Swelling Assessment N.A.Jayamaha, P.C. Amarasingha,K.Thanansan, S. Ajanthan, and C. D. Silva

Abstract — An algorithm to quantify the swelling by reconstructing 3D model of the face with stereo images is presented. We

analyzed the primary problems in computational stereo, which include correspondence and depth calculation. Work has been

carried out to determine suitable methods for depth estimation and standardizing volume estimations. Finally we designed

software for reconstructing 3D images from 2D stereo images, which was built on Matlab and Visual C++. Utilizing

techniques from multi-view geometry, a 3D model of the face was constructed and refined. An explicit analysis of the stereo

disparity calculation methods and filter elimination disparity estimation for increasing reliability of the disparity map was

used. Minimizing variability in position by using more precise positioning techniques and resources will increase the accuracy

of this technique and is a focus for future work.

Keywords — Calibration, Rectification, Disparity map, Depth map, 3D Reconstruction

I. INTRODUCTION

One major problem in surgery planning and diagnosis

is the bundle of manual work like taking facial and skull

parameters, modelling parts of the skull using Plaster of

Paris etc. There is software available for this purpose

which is quite expensive. They require high end

technologies akin to CT (Computer Topographic) or MRI

(Magnetic Resonance Images) scanners as input devices.

Moreover almost all these software are built using

parametric values defined for either Europeans or

Americans.The proposed system has many interesting

applications in surgical field. Main aim is to quantify the

Swelling following Facial Surgery. This could well be

extended to surgery performed in other parts of the body.

The main intention of this project is to develop a system

which could be used to quantify thefacial swelling which

could provideuseful information to its users (Doctors) to

monitor post-surgical swelling in a more simple, accurate

and objective manner. Further it could become a useful

surgical research tool for use in comparative evaluation of

different surgical techniques for efficacy and safety.

3D reconstruction from stereo images is a difficult task;

sensitivity to light, shadow & other changes on images

causes major effect on the result. Strong algorithms and

filters have to be used to raise accuracy and performance

of the system

The system identifies pixel details of both stereo

imagesand provides the Depth map. The depth and

disparity plot gives a pretty fair idea of the relative

positions of the various objects in the images; but to

improve the user‘s understandability we try to regain the

lost characteristics of the image by warping the intensity

colour values of the image on the disparity and plotting it

on a 3D view.

D. N. A. Jayamaha, P. C. Amarasingha, K. Thanansan, S.

Ajanthan and C. DeSilva are with the Computer Science &

Engineering Department, University of Moratuwa, Sri Lanka

(phone: 071-421-4340; e-mail: [email protected],

[email protected], [email protected],

[email protected], [email protected]).

II. GENERAL 3D RECONSTRUCTION APPROACHES

For the past decade the majority of 3D reconstruction

research has been focused on recognition from single

frame, frontal view, and 2D face images of the subject.

Whilst there has been significant success in this area

using techniques such as Eigen faces and elastic bunch

graph matching several issues look set to remain unsolved

by such approaches. These issues include the current set

of algorithms inability to robustly deal with large changes

in head pose and illumination.

In recent years, a mixture of non-contact, optically-

based 3D data acquisition techniques have been

developed that can be applied to the imaging of humans.

A wide variety of commercial and non-commercial off

the-shelf devices for 3D optical sensing are available that

can be categorized as follows:

A. Morphable Modelling

Huang, Blanz and Heisele propose a 3D recognition

solution which utilizes a morphable 3D head model to

synthesize training images under a variety of conditions

[1]. The main idea behind the solution is that given a

sufficiently large database of 3D face models any

arbitrary face can be generated by morphing models

already in the database. In the recognition stage of their

work a component based face recognition system is used.

B. Euclidean Transformation

Based on available 3D data aclassical approach to this

problem usually attempt to find a Euclidean

transformation which maximizes a given shape similarity

measure. Irfanoglu, Gokberk and Akarun [2] use a

discrete approximation of the volume differences between

facial surfaces as their Euclidean similarity measure. In

contrast Bronstein, Bronstein and Kimmel [3] propose an

alternative to this solution where they choose an internal

face representation which is invariant to isometric

distortions.

mailto:[email protected]






126

Invariance to isometric distortions allows the

recognition system to be highly tolerant to changes in

expression; this is in contrast to classical techniques

which are more suited for matching rigid objects due to

the nature of the Euclidean transformations most often

used.

C. Time of Flight Radar

Time-of-flight approaches include optical, sonar, and

microwave radar which, typically calculate distances to

objects by measuring the time required for a pulse of light,

sound, or microwave energy to return from an object [4].

Good results are obtained for large objects. For smaller

objects however, a high speed timing circuitry is required

to measure the time-of-flight, since the time differences to

be detected are in the 10-12

seconds range for about 1mm

accuracy. Unfortunately, making direct measurements of

time intervals with less than 10 Pico seconds accuracy

(this is 1/8 inch) remains relatively expensive.

D. Laser Scanning Triangulation

One of the most accepted 3D data acquisition

techniques that have been successfully applied to object

surface measurement is laser scanning triangulation. The

technique involves projecting a stripe of laser light onto

the object of interest and viewing it from an offset camera.

Deformations in the image of the light stripe correspond

to the topography of the object under the stripe which is

measured.

Many commercial products for 3D acquisition have

been released employing laser scanning triangulation

methods. Cyber ware [6] developed 3D scanners based on

this technology which have been used by the movie

industry to create special effects.

E. Coded Structured Light

Coded structuredlight systems are based on projecting a

light pattern instead of a single stripe and imaging the

illuminated scene from one or more viewpoints [5]. This

eliminates the need for scanning across the surface of the

objects associated with laser scanners.

The objects in the scene contain a certain coded

structured pattern that allows a set of pixels to be easily

distinguishable by means of a local coding strategy. The

3D shape of the scene can be reconstructed from the

decoded image points by applying triangulation. Most of

the existing systems project a stripe pattern, since it is

easy to recognize and sample.

III. STEREO VISION

The most obvious technique for 3D construction is

using stereo vision. Stereo imaging is imaging with two

cameras and finding depth of each of the image points

from camera analogues to human vision system.

Computers accomplish this task by finding

correspondences between points that are seen by one

imager and the same points as seen by the other imager.

With such correspondences and a known baseline

separation between cameras, the 3D location of the points

can be computed. Although the search for the

corresponding search is computationally expensive the

search can be narrow downed by the process called

rectification.

In practice, stereo imaging involves four steps when using

two cameras [6].

Mathematically remove radial and tangential lens

distortion. This is called undistortion. The outputs of

this step are undistorted images.

Adjust for the angles and distances between cameras,

a process called rectification. The outputs of this step

are images that are row-aligned (means a point in one

image will be in the same row of the second image)

and rectified.

Find the same features in the left and right camera

views, a process known as correspondence. The

output of this step is a disparitymap, where the

disparities are the differences in x-coordinates on the

image planes(left and right images) of the same

feature viewed in the left and right cameras: (xl – xr)

If we know the geometric arrangement of the

cameras, then we can turn the disparity map into

distances by triangulation. This step is called

reprojection, and the output is a depth map.

A. Stereo Calibration

To accomplish undistortion, first we have to calibrate

the camera & find parameters of the camera. For the

stereo calibration a set of chessboard images are taken

simultaneously using both cameras.In order to find out the

position of any corner we only need to know how many

horizontal and vertical squares there are in the chessboard

and the size of a square. The chessboard in the image is a

9x6 chessboard and if we print it in a paper of size A4 the

size of the squares would be more or less 2.5cm.

Following OpenCV functions are used for finding and

drawing the corners of the chessboard images,

cvFindChessboardCorners( image, board_sz,

corners,&corner_count,

CV_CALIB_CB_ADAPTIVE_THRESH |

CV_CALIB_CB_FILTER_QUADS );

Figure 1 : Founded Corners in Stereo Calibration


127

B. Stereo Rectification

It is easiest to compute the stereo disparity when the

two image planes align exactly. Unfortunately, in real

world application, a perfectly aligned configuration is rare

with a real stereo system, since the two cameras almost

never have exactly coplanar, row-aligned imaging planes.

There are many ways to compute our rectification terms,

of which OpenCV implements.

(1) Hartley’s algorithm, [Hartley98], which can yield

uncalibrated stereo using just the fundamental matrix [6].

(2) Bouguet’s algorithm, which uses the rotation and

translation parameters from two calibrated cameras [6].

C. Finding Correspondence

Image points are matched across stereo image pairs and

then reconstructed to three dimensions. The most

common class of correspondence measures are pixel

based algorithms [7, 8] which compare similarity between

pixels across images in order to deduce likely matching

image points. The problem of matching 2D camera

projections of real world image points across stereo image

pairs leads to a host of additional issues including input

point selection and ―good‖ match selection. Keller

conducts a comprehensive evaluation of matching

algorithms and match quality measures in [9]. Additional

work that contains a comprehensive evaluation of a large

number of correspondence algorithms can be found in

[10].

A number of solutions to the stereo correlation problem

have been proposed that operate on the camera input in

the frequency domain. Frequency domain approaches are

typically attractive because of their processing speed and

inherent sub-pixel accuracy [11]. Following Figure

isRepresentation of Stereo Projection.

Matching a 3D point in the two different camera viewscan

be computed only over the visual areas in which the views

of the two cameras overlap.

In order to maximize the overlap we have already

arranged our cameras to be nearly frontal parallel as

possible.

We can calculate the disparity as follows,

d = xl– xror d = xl – xr– (cx left – cx right) (1)

if the principal rays intersect at a finite distance.(In our

case this is true)

xl - x co-ordinate of a point in the left image

xr - x co-ordinate of the corresponding point in the right

image.

Points in two dimensions can also then reprojected into

three dimensions given their screen (image) coordinates,

the disparity and the camera intrinsic matrix.

The reprojection matrix Q is:

[

]

(2)

Here the parameters are from the left image except for cx‘,

which is the principal point x coordinate in the right

image. Tx is Translation of two cameras on x direction.

Given a two-dimensional homogeneous point (x,y) of the

left image and its associated disparity d, we can project

the point into three dimensions using the following matrix

equation,

[

] [

] (3)

The 3D coordinates are then (X/W, Y/W, Z/W).

Instead of using Q directly, the real world co-ordinates

(X,Y,Z) can be computed using following equations(same

as using Q matrix),

(4)

Where,

: Disparity of the particular pixel

: Left image x co-ordinate of the particular pixel

: Left image y co-ordinate of the particular pixel

: Translation of left camera with respect to right

camera in x direction

: Focal length in pixel of the left camera after

rectification

Figure 2: Stereo Projection

Figure 2 : Original Image & Disparity Map


128

X, Y, Z values are the coordinates of a corner in a

particular unit given during the calibration during the

calibration the length of the chessboard square can be

given in any unit, we selected cm as the unit (It is 2.2 cm).

1) Extended SAD

In this method, we used block-matching technique in

order to construct a 3D array for every disparity

calculation. After iterative application of averaging

filtering for each disparity, we selected the disparity ( d),

which has minimum disparity (i, j, d) as the most reliable

disparity estimation for pixel (i, j) of disparity map.

∑ ∑ ∑

(5)

Step 1: For every disparity d in disparity search range,

calculate 3D array for every Window.

Step 2: Apply average filtering iteratively to every 3D

array calculated for a disparity value in the range of

disparity search range.

Step 3: For every (i, j) pixel, find the minimum disparity

(i, j, d), assign its disparity index (d) to d(i, j) which is

called disparity map.

∑ ∑

(6)

For window size, Averaging filtering value

is . Averaging filter (linear filter) removes very

sharp change in energy which possibly belongs to

incorrect matching.

This function invoke the parameters from input images

and user can manually set the maximum disparity value

and window size.

function [spdmap, dcost, pcost, wcost]

= stereomatch(imgleft, imgright,

windowsize, disparity, spacc)

Set the corresponding Parameters for calculation. WS =

uint16(windowsize); %

Set window size

WS2 = uint16( ( WS - 1 ) /

2 ); % Half window

D =

uint16(disparity)+1; %

number of disparities

Then Initialize necessary parameters for Disparity

calculation pcost = zeros( heightL, widthL, D,

'uint8' );

wcost = zeros( heightL, widthL, D,

'single' );

dmap = zeros( heightL, widthL,

'uint8' );

dcost = zeros( heightL, widthL,

'single' );

h = zeros(WS,WS,'double'); h(1,1) = 1;

h(1,WS) = -1; h(WS,1) = -1; h(WS,WS) =

1;

Now calculating the pixel cost. for Dc = 1 : D

maxL = widthL + 1 - Dc;

pcost(:, Dc : widthL, Dc ) =

imabsdiff( imgright( :, 1 : maxL),

imgleft( :, Dc : widthL) );

end

Dc = 1 to D, Where D is maximum disparity from input

maxL = widthL+1-Dc

The system shows the resulting volume of the facial

area with respect to the polygon created by threeuser-

specified biometric points from navigating cursor points.

To find out the volume in the swelling we can subtract the

system produce volume values on the corresponding days.

The system calculated volume is measured in cm3.

2) Depth from Disparity

In order to find depth a calibration is done with the help

of disparity values generated by the above method. A

graph is drawn by regression analysis for known depth

and disparities.

Figure 4 :Final Release Figure 4 : Final Release

Figure 3 : Left & Right image inputs to draw graph


129

3) Results

Thus, using the mapped depth estimation the swelling

can be defined as

∑

(7)

(p=points inside the specified area of interest,db= depth

before swelling, df=depth after swelling, calculated pixel

area)

For experimental purposes we improvised facial swellings

by asking our participants to keep two ‗Alpenliebe‘

toffees inside.

TABLE I

CONSIDERED GEOMETRIC POINTS

Considered Geometric Points

Tip of the ear,

top of the nose

and corner of lips

Tip of the ear,

top of the nose

and tip of the

chin

System volume for a

swelling improvised by

two ‗Alpenliebe‘

toffees(measured by a

system specific unit for

volume)

2.38e+66 1.28+70

Standardised volume

for a swelling

improvised by two

‗Alpenliebe‘ toffees

30cm3 80cm3

Actual volume of two ‗Alpenliebe‘ toffees

approximately is 7.5 cm3. Yet the volume it reflects in

reality as a swelling is different from the 7.5cm3because

of the elasticity and surface resilience of skin.

IV. CONCLUSIONS

Concluding the results we observed, the differences

between the approximated swelling volume and the value

estimated from our system could be due to various

reasons.

We have used the same camera to take both pictures

since no two cameras have exact intrinsic parameters and

to avoid calibration. Then under practical assumptions

rectification was achieved bycarefully moving the camera

horizontally when taking pictures. System resolution has

to be kept at large for points have to be marked accurately

each time. Due to limited processing power of available

computers original pictures taken from DSLR camera

(canon550D 18mp,18-55mm lens) had to be compressed.

Thus, a data loss is incurred and pixel calibration is

affected. Also it should be highlighted the need of light

invariant environment because of the high sensitivity

factor of light in stereo vision because approximately the

same lighting environment should be kept at different

visits of a patient to the clinic. Yet the system can be used

for volume comparisons that are what is ultimately

needed, to observe reduction of the swelling. But for

standardising results for exact units is problematic unless

conditions are provided as needed.

However provided the required conditions the

processing ability of the eMedica system based on the

proposed architectural framework opens up its

applicability to a wide range of applications not only for

facial swelling but other volume calculation application

also, where the proposed framework can be customized

based on specific performance requirements in speed of

processing, etc.

Here we have tried to implement a totally new idea and

the eMedica team has brought this idea into a deployable

product level, but more improvements canbe added. We

had to meet the time constraints and

resources/technical/medical knowledge constraints. More

research and more effort will make this product a better

one and it will help lot of dental surgery and dental clinics

as well. So we hope that this tool will be improved in the

future and to release the new software as a commercial

product.

ACKNOWLEDGMENT

eMedica team wishes to acknowledge the Dep.of CSE

of University of Moratuwa,includingDr.Chandana

Gamage and Dr.MalakaWalpola. Our dutiful gratitudes

also belong to Doctor Harsha de Silva, Senior Lecturer-

University of Otago, Consultant OMF Surgeon and

Associate Prof Rohan De Silva for proposing the project

idea and providing basic medical knowledge we needed

on the subject.

REFERENCES

[1] J. Huang, V.Blanz and B.Heisele, ‖Face Recognition with

Support Vector Machines and 3D Head Models,‖ 2002.

[2] B. Gokberk, M. O. Irfano˘glu and L. Akarun

―Representation plurality and fusion for 3D face

recognition,‖ 2006.

[3] A. M. Bronstein, M.M.Bronstein and R. Kimmel,

Expression-Invariant 3D Face Recognition. 2003.

Figure 5 : Left & Right image inputs to draw graph


130

[4] (2012) Time-of-flight camera. [Online]. Available:

http://en.wikipedia. org/wiki/Time-of-flight_camera

[5] ]. M. Young, E. Beeson, J. Davis, S. Rusinkiewicz and R.

Ramamoorthi, ―Viewpoint-Coded Structured Light‖.

[6] ] G. Bradski and A. Kaehler, Learning OpenCV Computer

Vision with the OpenCV library, 1st ed., USA, O‘REILLY,

2008, ch. 12.

[7] J. Kim, V. Kolmogorov, and R. Zabih, ―Visual

Correspondence Using Energy Minimization and Mutual

Information,‖ 2003.

[8] S.O. Chan., Y.P. Wong, and J.K. Daniel, ―Dense Stereo

Correspondence Based on Recursive Adaptive Size Multi-

Windowing,‖ 2000.

[9] M. G. Keller, ―Matching Algorithms and Feature Match

Quality Measures For Model Based Object Recognition

with Applications to Automatic Target Recognition,‖ 1999.

[10] Scharstein andR.Szeliski, ―A Taxonomy and Evaluation of

Dense Two-Frame Stereo Correspondence Algorithms,‖

2001.

[11] U. Ahlvers and U. Zoelzer, ―Inclusion of Magnitude

Information for Improved Phase-Based Disparity

Estimation in Stereoscopic Image Pairs,‖2005

vision based non-invasive tool for facial swelling assessment

Healthcare