a hybrid condensed finite element model with gpu acceleration for interactive 3d soft tissue cutting

COMPUTER ANIMATION AND VIRTUAL WORLDS

Comp. Anim. Virtual Worlds 2004; 15: 219–227 (DOI: 10.1002/cav.24)* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

A hybrid condensed finite element modelwith GPU acceleration for interactive 3Dsoft tissue cutting

By Wen Wu* and Pheng Ann Heng* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

To meet the requirement of computer-aided medical operations, apart from the real-time

deformation, it is also necessary in the design to simulate the tissue cutting and suturing in a

surgery simulation. In this paper, we present a model on topology change and deformation of

soft tissue, referred to as the hybrid condensed finite element model, based on the volumetric

finite element method. The most important advantage of our model is its ability to achieve

an interactive frame rate for the topology change in surgical simulation on standard PC

platform. This is achieved through two innovations. One is to apply the condensation

technique, by fully calculating the volumetric deformation in the operation part while only

calculating the surface nodes in the non-operation part. Secondly, the major calculation

work in the Conjugate Gradient solver for cutting and deformation is migrated from the CPU

to the contemporary GPU to promote the calculation. Test examples have been given to

show the feasibility and efficiency of the model. Copyright # 2004 John Wiley & Sons, Ltd.

KEY WORDS: surgical simulation; soft tissue cutting and deformation; the GPU computation;the hybrid condensed finite element model

Introduction

In recent decades, the minimally invasive microsurgery

has been developed as a renovation technique. It has the

advantages of reducing the operation time and the trau-

matizing degree of patients. It requires the surgeons to be

highly experienced in hand-eye coordination skill during

operations. Therefore, the surgery simulation is expected

to play an essential role in the future surgical training.

A good surgical simulation system should provide

surgeons with visual, tactile and behavioral illusion of

reality.1 To meet this requirement, it is crucial to solve

the problem in the development of a surgical simulation

system to achieve simulation realism with interactive

speed. In the previous works, various models have been

presented. Among those, the mass-spring model is most

widely applied in modeling deformable objects due to

its simplicity. Picinbono et al.2 used the mass-spring

model to simulate the stiff character of the liver capsule,

while Delingette et al.3 expressed the fat tissue elasticity

as a network of springs on a 3-simplex mesh. Although

these mass-spring models are easy to construct, they

lack realism since a discrete set of masses are used in the

models to simulate the continuum mechanics. Com-

pared to finite element models (FEM), the range of

possible dynamic behaviors of spring models is more

constrained in mass-spring model.4 Bro-Nielsen et al.5

discussed real-time simulation of deformable objects

using a 3D solid volumetric fast finite element model.

In their solution, three optimized methods are used

during the computation: firstly, make use of the con-

densation technique to reduce the computation time by

confining the full computation only to the surface nodes

of the mesh; secondly, the system matrix is explicitly

inverted for the transformation calculation in order to

achieve a very low calculation expense; and finally,

selective matrix vector multiplication is applied accord-

ing to the sparse character of the force vector. Their

system enables a solid deformation for relatively large

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2004 John Wiley & Sons, Ltd.

*Correspondence to: Wen Wu, Department of ComputerScience and Engineering, The Chinese University of HongKong, Shatin, N.T., Hong Kong.E-mail: [email protected]

objects with video frame-rate. However, this model is

not feasible for the cutting operations since in cutting

the stiffness matrix needs to be updated when the

topology change occurs. Cotin et al.6 presented a real-

time deformation method by using a preprocessing of

elementary deformations derived from a finite

element method. The linear model in their method

is enhanced by taking biomechanical results on soft

tissues into account and it produces more realistic

behavior. However simulation of tissue cutting in

real-time was impractical because several tasks should

be performed: re-meshing of the cutting area,

recalculating the stiffness matrix, and new stage of

preprocessing.

Alongwith the improvement of the hardware archi-

tecture and the increasing requirements from the

commercial entertainment/games industry, graphics

hardware has migrated away from the traditional

fixed-function pipeline to a newly reconfigurable

graphics pipeline. As a streaming processor, its out-

standing polygon and pixel processing speed has

greatly attracted researchers to utilize it into both

common graphics problems and generic computa-

tional applications. Entering 2003, the full IEEE

floating point precision data type supported in

shaders and texture stages allows the GPU to be

applied into richer body of general applications. The

applicable fields include collision detection and

object intersections,7,8 physical simulation,9,10 scienti-

fic computing such as solvers for linear algebra11 and

partial difference equations,12 and fast Fourier

transformation in image processing13 etc. In this

paper, we will demonstrate an application of the

programmable GPU in another area namely, surgery

simulation.

In this paper, we present a deformation model, called

a hybrid condensed finite element model, based on the

volumetric finite element method accelerated by GPU.

The most important advantage of our model is its ability

to achieve an interactive frame rate for the topology

change in surgical simulation. The calculation by the

GPU can be integrated into our system for taking the

sparse matrix and vector multiplication of the Conjugate

Gradient solver. In the second section, the fundamental

theory of the linear elastic finite element model and the

condensation technique will be introduced briefly. In

the third and fourth section, the structure of our model

and the mapping methods of computation onto the GPU

will be described respectively. The implementation and

experimental results are given in the fifth section.

Finally, we conclude the paper with a discussion.

Fundamental Theory

In finite element method, the continuum or object is

subdivided into elements joined at discrete node points.

The solution is subjected to constraints at the node

points and the element boundaries so that continuity

between elements is achieved.

We chose the four nodes linear tetrahedral element to

model the soft tissue in 3D domains. The displacement

variation u ¼ u; v;wð Þ within an element can be given by

the nodal displacements and the shape function as

follows

u ¼ INi; INj; INm; INp

� �ui uj um up� �T ð1Þ

where I is a 3 � 3 identity matrix and the shape function

N is defined as:

Ne ¼ae þ bexþ ceyþ dez

6V; e ¼ i; j;m; p ð2Þ

ai¼1 ¼ a1 ¼ �1ð Þi�1

xj yj zj

xm ym zm

xp yp zp

��

bi¼1 ¼ b1 ¼ �1ð Þi1 yj zj

1 ym zm

1 yp zp

��

ci¼1 ¼ c1 ¼ �1ð Þixj 1 zj

xm 1 zm

xp 1 zp

��

di¼1 ¼ d1 ¼ �1ð Þi�1

xj yj 1

xm ym 1

xp yp 1

��

In which V represents the volume of the element. The

other constants are defined by cyclic interchange of the

subscripts in the order i, j, m, p.

Linear Elasticity

The state of deformation at a point in the solid is

described by the strain tensor, which is the second order

tensor. The physical behavior of soft tissue may be

considered as linear elastic if its displacement and

deformation remain small.14 In this case, the second

order term can be neglected. We use a linear strain

approximation to represent the tissue deformation,

W. WU AND P. A. HENG* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2004 John Wiley & Sons, Ltd. 220 Comp. Anim. Virtual Worlds 2004; 15: 219–227

where " is a set of first order differential equations

presented by the displacement

"xx ¼@u

@x"yz ¼

@v

@zþ @w

@y

"yy ¼@v

@y"zx ¼

@w

@xþ @u

@z

"zz ¼@w

@z"xy ¼

@u

@yþ @v

@x

ð3Þ

The potential energy � of a body is the sum of the total

strain energy and the work done on the body by external

forces.15 With the assumption of the soft tissue as the

isotropic linear elastic material, there is a linear relation-

ship between the stress and the strain. We obtain the

strain energy E as follows

E ¼ 1

2

ðV�T"dV ¼ 1

2

ðV"TD"dV ð4Þ

where D is the symmetric elastic matrix which relates

physical properties of the soft tissue we modeled. It

depends on the parameters of Young’s modulus, shear

modulus and Poisson’s ratio. �T ¼ ½�xx; �yy; �zz;

�yz; �zx; �xy� and "T ¼ ½"xx; "yy; "zz; "yz; "zx; "xy� are vectors

of the stress and strain components.

Forces applied to a deformable body include: distrib-

uted body forces, distributed surface forces and con-

centrated loads. The work done by these forces can be

written as

W ¼ðVu � fbdVþ

ð�

u � f sdSþXi

ui � pi ð5Þ

where u is the displacement vector, fb x; y; z� �

are body

forces applied to the object volume dV, f s x; y; zð Þ are

surface forces applied to the object surface dS, and pi are

concentrated loads acting at the points xi; yi; zi

� �. From

(1) and (3), the strain " at arbitrary point in the element

can be expressed in terms of the nodal displacements

and the shape function

" ¼ BU ð6Þ

where U is the 3ne � 1 (ne is the number of nodes in one

element) vector of the nodal displacements. B is the

6 � 3ne matrix.

In equilibrium, the potential energy reaches the mini-

mum, which means @�=@ui ¼ 0. A set of linear equa-

tions has been derived: Keue ¼ fe, where Ke is the

stiffness matrix of a single element. To assemble all of

the stiffness matrices of the elements into one, a large

but sparse linear system in form of K u ¼ f is deduced,

where K is the global stiffness matrix in 3n� 3n, where n

is the total number of nodes in the model.

Condensation

Condensation is a process by which some of the degrees

of freedom are eliminated from the overall equilibrium

equations prior to the continuation of other phases in

the solving.16 In finite element method, it is efficient to

employ condensation at a certain stage to eliminate

variables that do not directly participate in the rest of

the solution.

To establish the equations used in condensation, we

rewrite the sparse linear system as the block matrixes

form

Kii Kib

Kbi Kbb

" #ui

ub

� �¼

f i

fb

" #ð7Þ

where the i represents the condensed degree of freedom

(DOF), and the b is the retained DOF. The upper part of

(7) can be solved for ui as

ui ¼ K�1ii f i � K�1

ii Kibub ð8Þ

To substitute this equation into the lower part of (7) for

ui, we can obtain the new equations for ub

K0bbub ¼ f 0b ð9Þ

where the reduced stiffness matrix K0bb is

K0bb ¼ Kbb � KbiK

�1ii Kib ð10Þ

and the reduced force vector f 0b is given by

f 0b ¼ fb � KbiK�1ii f i ð11Þ

The new stiffness matrix K0bb in the reduced equation (9)

is obviously denser than that in the original one. The

size of the equations is just the same as what derived

from the surface FEM. It means that, the condensed one

is mathematically equivalent to the volumetric FEM,

bearing the volume character in the solution but only in

the equal computation expense of the surface FEM

solution.

INTERACTIVE 3D SOFT TISSUE CUTTING* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


Structure ofHybridCondensedFinite ElementModel

In surgeries, most operations are concentrated on the local

pathologic area of organs. In these regions, tissues undergo

cutting or tearing, thus more precise models are required.

We assume the non-linear deformation or the topology

changes only occur in the operation part throughout the

whole surgery. With this assumption, the key idea of the

hybrid FEM algorithm is to separate the areas containing

the non-linear elastic properties or topology changes from

the others. It includes

* Operation part where the surgery occurs. It corresponds

to the pathological structures and only represents a

small part of the whole model in a simulator. In this

part, tearing and cutting are simulated.

* Non-operation part where only the deformation

occurs. But it greatly contributes to the realism of

the simulation.

Different models treat areas with different properties

respectively to optimize the trade-off between the com-

putation time and the realism of the simulation.

The illustration of the hybrid model is shown in

Figure 1. Since these two parts connect with the com-

mon nodes, additional boundary conditions are added

to both models. We rewrite the linear system equations

of the operation area (12) and non-operation area (13) as

a block matrix form respectively

K11 K1I

KI1 K1II

" #A1

AI

� �¼

P1

�PI

� �ð12Þ

K2II KI2

K2I K22

" #AI

A2

� �¼

PI

P2

� �ð13Þ

where the subscript 1, 2 and I represent the operation

area, non-operation area and the common nodes shared

with these two areas respectively, the superscript 1, 2

denote the common nodes represented in the operation

area and the non-operation area respectively. PI and

�PI is the force and counterforce respectively applied to

the common nodes when we analyse these two parts.

From (13) we obtain

ðK2II � KI2 � K�1

22 � K2IÞ � AI ¼ PI � KI2 � K�122 � P2 ð14Þ

Representing (14) in another form

PI ¼ K0 � AI þ P0 ð15Þ

where

K0 ¼ K2II � KI2 � K�1

22 � K2I ; P0 ¼ KI2 � K�122 � P2 ð16Þ

Substituting (15) into (12), we can get a new matrix

system (17), from which the displacements of the opera-

tion area A1 and AI can be obtained.

K11 K1I

KI1 K1II þ K0

" #A1

AI

� �¼

P1

�P0

" #ð17Þ

The next step is to obtain the deformation of the non-

operation area. As the inside nodes of the non-operation

area are unrelated with any action of the surgeon even

invisible, we can just treat them as redundant nodes for

the simulation. Computing these nodes only brings mean-

ingless burden of calculation. We adopt condensation to

remove them from the computation process. This keeps

the original physical character of the volumetric system

but the size of the condensed matrix equation is equal to

the FE surface model. Therefore, we can rewrite the non-

operation area equation in the condensed form as

K2II KIs KIi

KsI Kss Ksi

KiI Kis Kii

264

375

AI

As

Ai

264

375 ¼

PI

Ps

Pi

264

375 ð18Þ

The subscript i represents the internal nodes to be

condensed out and s represents the surface nodes to

be retained. From (18), we have

Kss � Ksi � K�1ii � Kis

� �� As ¼ Ps � Ksi � K�1

ii � Pi

þ Ksi � K�1ii � KiI � KsI

� �� AI

ð19Þ

Because the internal nodes hasn’t been acted by the

external force, the term of Ksi � K�1ii � Pi is equal to zero.

Then we can derive the new matrix equation K�As ¼ P�

which only relates to the variables of the surface nodes

K� ¼ Kss � Ksi � K�1ii � Kis;

P� ¼ Ps þ Ksi � K�1ii � KiI � KsI

� �� AI

ð20ÞFigure 1. The illustration of the hybrid model.

W. WU AND P. A. HENG* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


It should be noticed that, the form of P� is different from

the one derived from the condensation used in general

FEM. It has one term that relates to the common nodal

displacements AI in the hybrid FEM.

All terms in K0, P0, K� and P� are known constants and

keep unchanged in the whole surgery simulation pro-

cess because no topology change will occur in the non-

operation part. Thus they are calculated in the pre-

processing stage. This stiffness matrix in (17) will be

updated according to the topology changed during the

simulation. However, it is clear that the order of this

system is far less than the order of the global FEM

system.

In summary, the interactive update and display of the

hybrid model is achieved through the processing on the

operation area. The model of this area is updated based

on the imposed displacement or forces on the boundary.

During this stage, the displacements of the nodes and

forces applied on the common nodes are computed.

Then we can obtain the displacements of nodes on the

surface of the non-operation model according to dis-

placements of the common nodes.

CalculationontotheGPU

As mentioned above, the interaction occurs in the op-

eration area, where the collision of the scalpel and the

tissue model is detected and the cutting operation is

processed. The topology of the model is changing on-

the-fly during the simulation. The corresponding equa-

tion (17) is constructed and solved in every different

processing step. Along with the size of the model

increasing, the time spent by the collision detection rises

as a linear function of the model scale. And the time of

constructing the new global stiffness matrix is squared.

Solving the linear system is about cubed complexity,

which takes almost seventy to eighty percent of the

time executed by the operation area. Therefore, solving

the linear algebraic equations is the bottleneck of the

whole simulation. It directly impacts the feedback speed

of the system. In our system, we use the Conjugate

Gradient iteration to solve the linear equations. The

global stiffness matrix derived from FEM exhibits a

feature of large size, sparse distribution of non-zero

elements, symmetric and positive definite, this suffi-

ciently meets the requirement of the condition of ordin-

ary Conjugate Gradient algorithm. The primary

operation in each iteration of the Conjugate Gradient

algorithm is the matrix-vector multiplication, so the zero

elements of the matrix would remain unchanged in the

calculation process, which makes it more suitable for its

implementation onto the GPU.

In this section, we introduce our method of mapping

the calculation kernel onto the GPU. The Conjugate

Gradient solver includes three primary operations: mul-

tiplication of the sparse matrix and the vector, addition

of two vectors and the sum-reduction operation. The

core component is the multiplication of the sparse

matrix and the vector. We migrate this part of calcula-

tion from the CPU into the fragment processor of the

GPU, to take advantage of the fragment processor in its

efficiently manipulation of the local texture memory on

the mathematical calculation. The basic principle is to

load matrices and vectors as textures into the GPU, and

then rasterize a proper quad of pixels for invoking the

fragment program, which process the actual calculation

in the fragment processor. The result can be obtained as

the color value, transferred directly to the next pass for

execution or readback to the CPU.

Texture RepresentationofMatrices

For the sparse matrix-vector multiplication: yi ¼Pnj¼0 ai;j � xj, the matrix representation in the GPU

should consider both the system data structure

and the GPU single-instruction-multiple-data (SIMD)

character.

Since the matrix is sparse and symmetric, we use an

one-dimensional array M to store the upper-right non-

zero elements of upper-right of the matrix in our system.

Two accessorial arrays are needed to restore the original

matrix (Figure 2), N1 for the indices of the diagonal

elements in M and N2 for the column index of every

non-zero element.

Various mapping methods have been proposed in

previous works.11,12,17 In our solution, an improvement

is made to the method from12. The diagonal and off-

diagonal elements of the matrix are stored separately by

different layout textures. Two dependent textures are

used to help fetch off-diagonal elements and proper

vector elements efficiently in the fragment program

computation. The entire process of multiplication for

the off-diagonal elements is as illustrated in Figure 3.

Figure 2. The data structure of the sparse matrix.


* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


Ty is the texture keeping the multiplication result, and

Tx has the same layout, which stores the vector x.

Telement keeps all the non-zero (off-diagonal) elements

of the matrix row by row and the index of every row-

starting point in Telement can be found by T index. The

texture coordinates of the corresponding elements in

vector x are stored in Tst. In12, rows were divided into

multiple groups in preprocessing, and each group re-

lated with a certain size of the texture has the equal

number of non-zero elements. The specified fragment

program processes each group. In the interactive simu-

lation, we should try to avoid such trivial work and

construct textures directly. We add one more content to

T index keeping the total number of non-zero elements in

each row, which is inspired from Buck’s presentation.18

Accordingly, in the fragment program, we use two

instructions to check and handle the case where the

pass number of processing is greater than the number of

non-zero elements in every row. To take full advantage

of the 4-channels type of texture element, we rearrange

the content of these textures with two different designs.

MethodA

Textures which have the same layout can be integrated

into one texture. As described above, using RGB chan-

nels of textures, Telement Tst and Tdiagonal T index can be

combined into one respectively (Figure 4). This compact

representation not only saves the texture memory, but

also reduces the number of instructions on texture

fetches. The pass number executed by the fragment

program is decided by the maximum number of non-

zero elements of the sparse matrix rows.

MethodB

In this method, we use all RGBA channels of Telement to

keep the non-zero (off-diagonal) elements. Because one

texture coordinates can decide four elements at one

time, if the number of non-zero elements isn’t a fourfold

number, it will be padded with zeros to let all channels

filled. Tst should provide all four coordinates of the

corresponding elements of x in Tx at one time. As

the original one texture can’t fulfill the requirement,

we use two textures Tst-a and Tst-b instead of Tst to keep

this information (Figure 5).

This strategy has more compact style of non-zero

elements representation. The fragment program is

more complex and a little bit longer. It has 13 more

instructions compared with last one, 5 of which are

texture fetches. The fragment program executes four

multiplications at one time. The current pass result

needs be accumulated with that from the previous

pass kept in a one channel texture. Therefore four

more instructions are used to accumulate these four

components before executing the accumulation between

two consecutive passes.

Implementation andExperiments

We have implemented all these algorithms by VCþþ. In

the pre-processing stage, we build an initial FEM with

both geometric shapes and physical parameters of

tissues. The geometric model of the patient organ was

generated based on CT/MRI images. The physical char-

acters include the boundary conditions of the model

such as the displacement constraints and the external

forces, the initial strains and some material parameters.

Figure 3. The entire process of the sparse matrix-vector

multiplication on the GPU.

Figure 4. The layout of Telement in method A.

Figure 5. The layout of Telement, Tst-a and Tst-b in method B.

W. WU AND P. A. HENG* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


After marking the surgery area interactively on the

mesh to determine the operation part, the non-operation

part and the interface between the two parts, the sub-

matrices K0, P0, K� and P�, which remain constant

throughout the simulation can be calculated in advance.

The data we employed comes from the Visible

Human Project.19 We created a volume tetrahedral

mesh of a short portion of the upper leg (Figure 6) for

the experiment. The thighbone is not shown in the

figures for clarity, thus leaving a hole in the model.

The nodes fixed on the thighbone have no displace-

ments. Three nodes on the surface of the operation area

are imposed with certain forces to simulate the stresses

caused by other organs or forces applied by surgical

tools. These forces can also cause the initial deformation

of the soft tissue. When the scalpel collides with the

tissue and the press is big enough, the cutting operation

occurs (see Figure 7). Figure 8 shows the deformation of

the tissue during the cutting process. For the model of

which the non-operation area has 633 nodes (416 surface

nodes) and 2134 tetrahedrons, the average execution

time for obtaining the displacement of the non-opera-

tion with condensed model is almost four times faster

than that of non-condensed model (Table 1).

Table 2 compares the time of the sparse matrix and

vector multiplication, which run in Conjugate Gradient

solver of our system, on the GPU and the CPU respec-

tively with different degrees of freedom of the operation

area. It’s the time of one multiplication on NVIDIA

GeForceFX 5950 Ultra with 5.2.1.6 driver, 1.5 GHz

Pentium 4 CPU. The size of the linear equation derived

from the operation area model is from 456 to 2190. The

maximum number of non-zero elements per row deci-

des the pass number of the fragment program, and at

each pass the result are added into that in the previous

pass. Because method B makes full use of the 4-channels

character to compactly fill non-zero elements and co-

ordinates indices into the texture, the pass number

required can be remarkably reduced so that time spent

on the data transfer between two consecutive passes can

be saved. That makes method B more efficient than

method A. Along with the increase of the data scale

processed, the performance of the method B on the

GPU can reach 2.15 times faster than the CPU

Figure 6. (a). The operation area mesh. (b). The non-operation

area mesh. (c). The whole volume tetrahedral mesh of the

shorter portion of the upper leg.

Figure 7. (a)–(h) The cutting operation.


* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


implementation. Even for the method A on the GPU, an

improvement rate of 1.14 times can be achieved. It

proves that GPU implementation shows predominance

in processing large scale data due to its high degree of

parallelism, of which the CPU is lacking.

Conclusion andDiscussion

In this paper, we proposed a hybrid condensed finite

element model with the GPU hardware acceleration for

surgical simulation of tissue cutting and deformation.

The model processes the tissue as two different types.

For the first type of the tissue which is not involved with

any cutting operation, the deformation is calculated

only to the surface nodes, by a condensation technique,

so that no computation is required to a large number of

interior nodes. For the second type of the tissue involved

with the cutting operations, more comprehensive pro-

cess is applied. In our solution, as the new topology

constructs on-the-fly during the cutting procedure, the

stiffness matrix is refreshed to build the new linear

equations for the nodal displacements. Conjugate Gra-

dient iteration is adopted to solve the problem. To

improve the efficiency, we migrate the most computa-

tional intensive part of the calculation onto the GPU for

further acceleration. Two methods of mapping data

onto the GPU have been proposed in accordance with

the application data structure, and as a result, the

simulation can reach an interactive frame rate, even

with topology change to the model in a certain degree

of complexity.

As a future work, the force applied by surgical

instruments shouldn’t be restricted in the operation

area. Some other physical characters can be added to

the model to enhance the realism. On the other hand,

current methods of mapping the calculation on the GPU

are sensitive to the bandwidth of the sparse matrix. As

long as one row has a high bandwidth, a high number of

passes would be required so that the cost could be

sharply increased. In fact, we find that most of rows

have non-zero elements less than twenty. A direct

method to solve this problem is to optimize the mesh

prior to the simulation to make the incident edges for

each node fairly uniform in number. Another possible

solution is to investigate a better layout of texture for

storing the data.

Generalmodel Condensedmodel

Average time 11.372 2.906

Table1. Time of twomodels for 633 systemnodesin the non-operation area (Pentium4, 1.5GHz)

(Source: Time :ms)

DOF Max.Non-zero Max.PassNo. Max.PassNo. Time Time CPUElementNo. (GPUMethodA) (GPUMethod B) (GPUMethodA) (GPUMethod B) Time

456 188 188 47 1.913 1.118 0.622960 188 188 47 1.808 1.023 1.1161137 188 188 47 1.811 1.027 1.5001602 188 188 47 1.837 1.056 1.7311854 190 190 48 1.846 1.057 2.2882100 231 231 58 2.106 1.140 2.3232190 237 237 60 2.196 1.161 2.503

Table 2. Timeofthe sparsematrix-vectormultiplication,whichrunsintheConjugateGradient solverofour systemon theGPUandCPUrespectively, with differentdegrees of freedomof the operation area.ItisthetimeofonemultiplicationonNVIDIAGeForceFX5950Ultrawith5.2.1.6driver,1.5GHzPentium

4CPU. (Source:Time :ms)

Figure 8. The deformation during the cutting process.

W. WU AND P. A. HENG* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


ACKNOWLEDGEMENTS

This work was supported by the Research Grants Council of the

Hong Kong Special Administrative Region under Central

Allocation Grant (project no. CUHK1/00C) and Earmarked

Research Grant (project no. 4356/02E).

References

1. Bro-Nielsen M. Finite element modeling in surgery simula-tion. In Journal of the IEEE 1998; 86(3): 490–503.

2. Picinbono G, Lombardo J-C, Delingette H, Ayache N.Improving realism of a surgery simulator: linear anisotro-pic elasticity, complex interactions and force extrapolation.Tech. Rep. 4018, INRIA, Sep. 2000.

3. Delingette H, Subsol G, Cotin S, Pignon J. A craniofacialsurgery simulation testbed. In Visualization in BiomedicalComputing (VBC’94), 1994; 607–618.

4. Delingette H. Towards realistic soft tissue modeling inmedical simulation. In Proceedings of the IEEE: Special Issueon Surgery Simulation, 1998; 521–523.

5. Bro-Nielsen M, Cotin S. Real-time volumetric deformablemodels for surgery simulation using finite elements andcondensation. In Proceedings of Eurographics’96—ComputerGraphics Forum, 1996; 15: 57–66.

6. Cotin S, Delingette H, Ayache N. Real-time elasticdeformations of soft tissues for surgery simulation. IEEETransactions on Visualization and Computer Graphics 1999;5(1): 62–73.

7. Agarwal P, Krishnan S, Mustafa N, VenkatasubramanianS. Streaming geometric computations using graphics hard-ware. Tech. Rep., AT&T Labs-Research, 2002.

8. Govindaraju N, Redon S, Lin MC, Manocha D. CULLIDE:interactive collision detection between complex models inlarge environments using graphics hardware. In Proceed-ings of Graphics Hardware, 2003.

9. Harris MJ, Baxter WV III, Scheuermann T, Lastra A.Simulation of cloud dynamics on graphics hardware. InProceedings of Eurographics/SIGGRAPH Workshop onGraphics Hardware, 2003.

10. Kim T, Lin M. Visual simulation of ice crystal growth. InProceedings of ACM SIGGRAPH/Eurographics Symposiumon Computer Animation, 2003.

11. Kruger J, Westermann R. Linear algebra operators for GPUimplementation of numerical algorithms. In Proceedings ofSIGGRAPH, 2003.

12. Bolz J, Farmer I, Grinspun E, Schroder P. Sparse matrix sol-vers on the GPU: conjugate gradients and multigrid. InProceedings of SIGGRAPH, 2003.

13. Moreland K, Angel E. The FFT on a GPU. In Proceedings ofGraphics Hardware, 2003.

14. Fung YC. Biomechanics-Mechanical Properties of LivingTissues, 2nd edn. Springer-Verlag; 1993.

15. Gibson SFF, Mirtich B. A survey of deformable modelingin computer graphics. http://www.merl.com, November1997.

16. Kardestuncer H, et al. Finite Element Handbook.McGraw-Hill: New York, c1987.

17. Larsen ES, McAllister D. Fast matrix multiplies usinggraphics hardware. In Proceedings of Supercomputing, 2001;55–60.

18. Buck I, Hanrahan P. Data parallel computation on graphicshardware. Graphics Hardware 2003 Panel: GPUs as StreamProcessors, presentation ppt, 2003.

19. The Visible Human Project. Accessed April, 2004. http://www.nlm.nih.gov/research/visible/visible_human.html(*The texture of the cross section was created from the dataof the Visible Human Project.)


* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *


a hybrid condensed finite element model with gpu acceleration for interactive 3d soft tissue cutting

Documents