[ieee 2008 30th annual international conference of the ieee engineering in medicine and biology...

Cardiac Disease Recognition in Echocardiograms Using Spatio-temporalStatistical Models

David Beymer and Tanveer Syeda-Mahmood

Abstract— In this paper we present a method of automaticdisease recognition by using statistical spatio-temporal diseasemodels in cardiac echo videos. Starting from echo videos ofknown viewpoints as training data, we form a statistical modelof shape and motion information within a cardiac cycle foreach disease. Specifically, an active shape model (ASM) is usedto model shape and texture information in an echo frame. Themotion information derived by tracking ASMs through a heartcycle is then represented compactly using eigen-motion featuresto constitute a joint spatio-temporal statistical model per diseaseclass and observation viewpoint. Each of these models is thenfit to a new cardiac echo video of an unknown disease, and thebest fitting model is used to label the disease class. Results arepresented that show the method can discriminate patients withhypokinesia from normal patients.

I. INTRODUCTION

Echocardiography is often used to diagnose cardiac dis-eases related to regional motion, septal wall motion, aswell as valvular motion abnormalities. It provides imagesof cardiac structures and their movements, giving detailedanatomical and functional information about the heart fromseveral standard viewpoints such as apical 4-chamber view,parasternal long-axis view, etc. Cardiologists routinely usethe echo videos to aid their diagnosis of the underlyingdisease. However, since the available tools provide few quan-titative measurements about the complex spatio-temporalmotion of the heart, physicians usually resort to ’eye balling’the disease. Discerning motion abnormalities such as whenthe myocardium contracts significantly less than the rest ofthe tissue, is difficult, in general for humans. Unlike theinterpretation of static images such as X-rays, it is difficultfor physicians to describe the nature of the abnormalitiesin moving tissues. Thus, tools that automate the diseasediscrimination process by capturing and quantifying thecomplex 3D non-rigid spatio-temporal heart motion cansignificantly aid the decision support process of physicians.

In this paper we present a method of automatic diseaserecognition using statistical spatio-temporal models of dis-eases in cardiac echo videos. Unlike previous approaches tocardiac disease discrimination that perform feature extractionand similarity search on global average motion [10], weperform a detailed modeling of the appearance (shape andtexture) using the well-known Active Shape Models[3]. Themotion information derived by tracking ASMs through aheart cycle is then represented compactly using eigen-motionfeatures per disease class. Each of these models is then

The authors are with the Healthcare Informatics Group, IBM AlmadenResearch Center, 650 Harry Road, San Jose, CA 95120 USA. email:[email protected]

fit to a new cardiac echo video of an unknown disease,and the best fitting model is used to recognize the diseaseclass. Since different diseases are best diagnosed in differentcharacteristic views, we build the joint spatio-temporal modelfor chosen viewpoints per disease class. The average fit forall such viewpoints is then used to rank the overall diseasemodel fits. We demonstrate the effectiveness of the modelfitting approach to disease discrimination on patients withhypokinesia, and show that they can be discriminated fromnormal patients.

The rest of the paper describes this approach in detail. InSection II, we review related work on automatic cardiac dis-ease discrimination and spatio-temporal modeling of cardiacecho vidoes. In Section III, we describe our overall approachto disease recognition from spatio-temporal models. Finally,in Section IV, we present results of disease discriminationof hypokinesia patients over normal patients.

II. RELATED WORK

The estimation of cardiac motion and deformation fromcardiac imaging has been an area of major concentration inmedical image analysis [4]. More work has been done onsegmenting and motion tracking of heart regions such as leftventricle than classification of normal and abnormal motionpatterns. These include methods for ventricular segmentation[5], motion tracking [7], measurements on tagged MRI [12]and functional measurements of blood flow [6]. Regionalsegmentation has been an active research area with all meth-ods requiring some assistance in identifying initial candidateregions. Methods relying on phase contrast MRI also existto study cardiac motion and deformation [6], [8]. In general,all these methods depend on an accurate segmentation ofthe myocardium walls, but have the advantages of beingimaging modality independent. The dependency on obtainingan accurate segmentation, however, remains a significantissue, as there still are no fully automated robust and efficientleft ventricle (LV) surface segmentation methods.

Recently, there have been efforts to discriminate diseasesby analyzing spatio-temporal properties of heart regions.In [11], the left ventricular region was modeled usingvolumetric meshes to measure both geometric and motioninformation throughout a heart cycle. In [10], a global esti-mation of average direction and extent of motion was doneusing average velocity curves. Features were extracted fromaverage velocity curves and clustered to discriminate variouscardiac diseases. The global motion curves do not accuratelycapture the regional motion patterns observed in cardiacmotion where the septal wall moves quite differently from the

30th Annual International IEEE EMBS ConferenceVancouver, British Columbia, Canada, August 20-24, 2008

978-1-4244-1815-2/08/$25.00 ©2008 IEEE. 4784

valvular regions. Later work combined region segmentationusing a graph-theoretic approach based on normalized cut [9]with motion signatures to improve disease discrimination.Even so, the inaccuracies of region segmentation using agraph-theoretic approach and rough motion estimation us-ing average velocities often lead to inaccuracies in diseasediscrimination.

Finally, the approach presented here utilizes the machineryof Active Shape Models (ASMs) [3] and low-dimensionalstatistical models for motion. ASMs are an effective sta-tistical tool for modeling the appearance and texture ofnonrigid objects, and they have been extensively applied incomputer vision. They have also been previously used forechocardiogram segmentation [1], tracking and for regionlocalization and shape correspondence [9]. When combinedwith motion modeling, these statistical models present analternate approach to disease discrimination based on precisemodeling of spatio-temporal information. Specifically, bylearning such models from several example echo videos percardiac disease, we can express disease recognition as aproblem of fitting the appropriate spatio-temporal model.

III. SPATIO-TEMPORAL STATISTICAL MODELS

Our approach to disease discrimination is based on build-ing spatial and temporal models from sample learning datafor both disease and normal patient groups and using amatching algorithm for resolving the best matching model.Since different diagnostic viewpoints result in very differentspatio-temporal sequences, it will be necessary to buildspatio-temporal models separately per viewpoint. Further,since the viewpoints used to diagnose vary from disease todisease, the number of viewpoint-dependent spatio-temporalmodels will vary across diseases. For disease discrimination,therefore, we use spatio-temporal models from correspondingviewpoints for the purpose of model fitting. The overallapproach is as follows. For each disease and each of itsdiagnostic viewpoints, we collect a set of patient trainingcardiac sequences Seqview (Fig. 1, left) and we train twomodels: a spatial active shape model ASMview to representthe spatial information and a motion model Motview to repre-sent the motion within a heart cycle. The ASM models builda disease-specific model of cardiac appearance for individualframes. The motion model represents an entire sequence oftracked features by projecting them into a set of disease-specific “eigenmotions.” Given testing sequences from anew patient P with unknown cardiac disease, we selectcandidate ASM and motion models for the correspondingviewpoints from the learned disease models and try to fit themodels the new sequences. For a given disease, the modelfitting scores for the viewpoints are averaged to ensure agood fit to all viewpoint sequences taken from patient P.The disease label of the spatio-temporal model that showsthe best average fit is then taken as the disease label forthe new patient. Thus we combine evidence from differentechocardiographic viewpoints and diseases using a commonstatistical framework.

Seq A4C

Seq PLA

Seq PSAB

view

patient training

D training data 0

train

ASM A4C Mot A4C

ASM PLA Mot PLA

ASM PSAB Mot PSAB

D model0

ASM PLA Mot PLA

ASM PSAB Mot PSAB

D modeli

ASM PSAP Mot PSAP

D model i+1

Disease models

diagnostic views for disease D i

Fig. 1. Our system trains disease-specific models of echocardiogramappearance and motion for the diagnostically-important viewpoints foreach disease. The models consist of an active shape model (ASM) forstatic appearance, and a motion model for tracked feature motion in thecardiac cycle. A4C, PLA, PSAB, and PSAP are acronyms for cardiac echoviewpoints, and stand for, respectively apical 4 chamber, parasternal longaxis, parasternal short axis - basal level, and parasternal short axis - papillarymuscle level.

A. Appearance modeling

Using the paradigm of Active Shape Models [3], we rep-resent shape information in an echo frame by a set of featurepoints obtained from important anatomical regions depictedin the echo sequences. For example, in apical 4-chamberviews, the feature points are centered around the mitral valve,interventricular septum, interatrial septum, and the lateralwall (Fig. 2). We chose not to trace the entire boundary of theleft ventricle because the apex is often not visible due to theultrasound zoom and transducer location. Wall boundariesare annotated on both the inner and outer sections. While thefeature points are manually isolated during training stage,they are automatically identified during matching. Thus,the cardiac region information used during model buildingcaptures features important for cardiac assessment.

For each disease, a number of training images werecollected from cardiac cycle video clips – covering differentpatients, diagnostically important views, and time offsetswithin the cycle. We assume that each echo video cliprepresents a full heart cycle. If more than one heart cycleis present, they can be segmented into cycles based on thesynchronizing ECG often present in cardiac echo videos.

To model the appearance, we represent the features ex-tracted from each echo video frame by a shape vector sobtained by concatenating the locations (x,y) of n featuresf1, f2, . . . , fn as

s = [x1,y1, . . . ,xn,yn]T . (1)

Next, we form image patches centered around the featurelocations to capture texture information. Given a trainingimage with n features, the texture vector t concatenates thepixel intensities or RGB values from all the patches intoone long vector, where patch size is matched to the pixelspacing between features. Fig. 2 shows the feature points weuse for apical 4 chamber views. After shape annotation, theshape vectors s are geometrically normalized by a similaritytransform for model generation. The texture vector traw is

4785

Fig. 2. ASM feature points for apical 4 chamber view. (Best viewed incolor.)

Fig. 3. Top row: Mean shape s and top shape eigenvectors. Bottom row:Mean texture t and top texture eigenvectors (for clarity of presentation,texture is shown as one image instead of patches).

similarly normalized for echo gain and offset by subtractingthe mean and dividing by standard deviation as

t = (traw− t̄raw)/σ(traw). (2)

To construct the ASM model, we reduce the dimension-ality of the spatial and textural vector using PCA to find asmall set of eigenshapes and eigentextures. The shape s andtexture t can then be linearly modeled to form the activeshape model as:

s = S a+ s S =

| | |es

1 es2 . . . es

p| | |

t = T b+ t T =

| | |et

1 et2 . . . et

q| | |

(3)

where p eigenshapes es1,e

s2, . . . ,e

sp and q eigentextures

et1,e

t2, . . . ,e

tq are retained in the PCA (see Fig. 3). The p-

dimensional vector a and the q-dimensional vector b are thelow dimensional representations of shape and texture.

B. ASM Fitting

Fitting an ASM to a new image involves finding a similar-ity transform Γsim to position the model appropriately in theimage and recovering the shape and texture vectors a and b.This is iteratively estimated by alternating between shapeand texture update steps. Unlike most ASM techniques,which require a manual initialization to start the modelfitting, we use an automatic initialization method where a

distance-to-eigenspace method is first used to generate seedASM initializations. The seed locations are then refined andevaluated using a coarse-to-fine pyramid approach, whereseed hypotheses are pruned if their fitting error rises abovea threshold. The ASM seed hypotheses that reach the finestpyramid level in the first frame are then tracked across therest of the frames of the sequence, with the fitting resultfor frame t serving as the initial conditions for frame t +1.If multiple hypotheses survive sequence tracking, they areranked by their fitting score.

To evaluate an ASM fit at a given position, we measurethe error of fit in shape space and texture space using theMahalanobis distance and the texture reconstruction errorsuitably normalized. For image I, the ASM fit (a,b,Γsim)is

fit(a,b,Γsim) = aTΣ−1shpa+bT

Σ−1texb+2R2/λ

q+1tex , (4)

where R = ‖t− T T T t‖, t = I(Γsim(x,y)) is the extractedobject texture from the input image, λ

q+1tex is the (q + 1)th

texture eigenvalue, Σshp is a diagonal (p x p) matrix withshape PCA eigenvalues, and Σtex is a diagonal (q x q) matrixwith texture PCA eigenvalues (see Cootes and Taylor [2]).

C. Motion Modeling

Next, we consider the modeling of heart motion withina cardiac cycle. Using an ASM to track cardiac motionthrough a cycle with m frames will produce m sets of featurepoints {s1,s2, . . . ,sm}, where each si is a column vector ofstacked ASM (x,y) feature coordinates. To create a canonicalrepresentation, we vectorize the motion in the cardiac cycleby normalizing for image plane geometry and standardizingthe time axis to a target length n as follows:

1) Geometry normalization. Align the first frame s1 to acanonical position using a similarity transform Γsim

1 .Apply the same similarity transform to standardize allframes

si← Γsim1 (si).

Thus, all motion will be described in a coordinateframe that factors out scale, rotation, and translation.

2) Time normalization. Interpolate the si’s in time usingpiecewise linear interpolation to standardize the se-quence length from input length m to a target lengthn.

3) Shape decoupling. Factor out object shape by subtract-ing out frame 1, creating our final motion vector m

m = [ s2− s1,s3− s1, . . . ,sn− s1 ]T .

If each si contains F ASM points, the final vector mhas dimensionality 2F(n−1).

Given a set of example training motions mi for a givendisease and viewpoint, our linear model for a new motion mis m = ∑i cimi. Applying PCA to the training set mi yieldsa set of eigenmotions em

i and mean motion m, giving us a

4786

lower dimensional representation

m = M c+m M =

| | |em

1 em2 . . . em

r| | |

,

where r eigenmotions are retained. The r-dimensional weightvector c is our representation of object motion. Note thatthis representation is similar to Active Appearance MotionModels [1]. While their emphasis is to use the eigenmotionsto assist/constrain tracking, we use motion eigenanalysis asa feature for disease classification.

To collect training motions, the manual labeling effortis significant since we must annotate entire sequences. Wedeveloped a semiautomatic approach that leverages our ASMtracker. We ran our tracker on a set of training sequences,evaluating the returned tracks and making manual correc-tions as necessary. After vectorizing the motion vectors andapplying PCA, we can obtain motion models for our diseaseclasses. Fig. 5 shows two example motion models from ourexperimental results in the next section; notice much greatermotion for the normal class as compared to hypokinesia.

A new sequence m can be analyzed to evaluate the degreeof fit to our linear motion model. First, we project m to findc

c = MT [m−m]

and then the motion fit is a combination of the Mahalanobisdistance and motion reconstruction

m-fit(c) = cTΣ−1motc+2R2

mot/λr+1mot (5)

where Rmot = ‖m−MMT m‖, λr+1mot is the (r + 1)th motion

eigenvalue, and Σmot is a diagonal (r x r) matrix with motionPCA eigenvalues. The Mahalanobis term is a weighteddistance from the PCA projection of m to the mean motionm (but within the PCA model space). The reconstructionterm measures the distance from m to the PCA projectionof m, and it tells how well the model explains the data.

D. Disease recognition algorithm

Putting it all together, Fig. 4 summarizes our algorithm forrecognizing cardiac disease. We assume that all sequencesused for training as well as testing represent single heartcycles, all synchronized at the peak of the R wave. Theinput to the algorithm are a set of sequences Seqi1 ,Seqi2 , . . .from chosen viewpoints of a query patient with unknowncardiac disease, and a set of trained models per disease.When matching the query against a given disease Di, we firstfind a set of viewpoints K that they have in common. Foreach common viewpoint k ∈ K, we compute the fit betweenthe input sequence Seqk and models ASMk and Motk

ViewFit(i,k) =1F

F

∑f =1

fit(a f ,b f ,Γ f )+m-fit(c). (6)

The first term, from equation (4), averages the appearance fitover the F frames in Seqk. The second term, from equation(5), adds in the motion fit from the entire sequence. Finally,

Cardiac Disease Recognition Algorithm

Input:Patient P: input sequences labeled by viewpoint

P : Seqi1 ,Seqi2 , . . . for viewpoints i1, i2, . . .N disease models Di: ASM and motion models for

Di’s diagnostic views j1, j2, . . .Di : ASM j1 ,Mot j1 ,ASM j2 ,Mot j2 , . . .

Algo:for disease i from 1 to N do

K = {i1, i2, . . .}∩{ j1, j2, . . .}for all views k ∈ K do

1. Fit ASMk to frame 1 of Seqk2. Track feature points in remainder of Seqk

→ generates (a f ,b f ,Γ f ) over Seqk,1≤ f ≤ F, F = |Seqk|

3. Project tracks onto Motk→ generates c motion vector

4. Compute view fitViewFit(i,k) = 1

F ∑Ff =1 fit(a f ,b f ,Γ f )+m-fit(c)

DiseaseFit(i) = 1|K| ∑k∈K ViewFit(i,k)

return argmini

(DiseaseFit(i))

Fig. 4. Our algorithm for cardiac disease recognition.

the disease fitting score is the average over all k ∈ K, andthe disease with minimum fitting score is reported.

The terms in our fitting score allow us to integrateinformation from a variety of sources. Spatial shape andtextural appearance match information is incorporated intothe ASM a and b terms in fit(a,b,Γ). Motion similarityto the disease motion model is captured by the c term inm-fit(c). Finally, information from diagnostically importantcardiac viewpoints are fused in the view averaging step tocompute DiseaseFit(i).

To represent the null hypothesis that the patient has nodisease, we always add a “normal” class to model the non-disease state. The diagnostic viewpoints for the normal classare taken as the union of viewpoints from all the diseaseclasses. This will always allow for the normal class to becompared against any of the disease classes when analyzinga new patient.

IV. RESULTS

We now present experimental results on a 2 class problem:hypokinesia (D0) versus normal patients (D1). Hypokinesiais a cardiac condition where the heart suffers from reducedmotion, so we expect the motion models to be quite differentbetween the two classes. Fig. 5 shows the mean motion ofnormal patients versus Hypokinesia patients. It can be seenfrom this figure that the mean motion of normal patients ismuch greater than the hypokinetic mean. Thus, we expectthe motion related term in the fitting metric to differentiatehypokinesia from normal patients.

4787

Normal Mean Normal em1

Hypokinetic Mean Hypokinetic em1

Fig. 5. Mean motion and first eigenmotion for normal patients andhypokinesia patients. Time position in the cycle is encoded using linesegment hue. A large difference in mean motion intensity is immediatelyevident between normal and hypokinesia patients. (Best viewed in color.)

The diagnostic viewpoint we use for hypokinesia is Apical4 Chamber (A4C). We built A4C models of hypokinesia andnormal patients, and we matched these models against theA4C sequences of new patients with unknown disease labels.

Our data set came from hospitals in India with cardiolo-gists recording a complete echo exam in continuous video.From these videos of the complete workflow, we manuallyextracted individual A4C cardiac cycles from both normaland hypokinesia patients. The echo “workflow” video wascaptured at 320x240 and 25 Hz, and a ECG trace waveformat the bottom allowed us to synchronize extracted cycles withthe ECG R peak. Each video sequence was about 5 minuteslong per patient and depicted several heart cycles. Ourcurrent collection has over 200 echo videos or 200x5x30x60frames. The physicians in our project assisted us in theirinterpretation, so that the ground truth labels for the cardiaccycles could be obtained. Due to the effort in manual labelingfor model training, we report our results on a data set with16 hypokinesia patients and 5 normal patients.

The testing results for our disease recognition algorithmare given in Table I. During testing, because of our dataset size, we employed a leave-one-out methodology: whentesting a sequence from patient X, the entire data set minusX is used for training. From Table I, we notice that therecognition rate for normal class is lower than that ofHypokinesia class, which lowers the average recognition rate.Increasing the number of patients in the normal class willlead to better recognition rates.

While this hypokinesia vs. normal experiment emphasizesthe motion term in equation (6), the shape appearance termin (6) would help distinguish diseases characterized by shapedifferences, such as the enlarged heart chambers in dilatedcardiomyopathy.

All patients Hypokinesia D0 Normal D1 Disease Ave(N = 21) (N = 16) (N = 5) (D0 +D1)/2.0

RecogRate 85.7% 93.7% 60.0% 76.8%

TABLE IDISEASE RECOGNITION RATE FOR HYPOKINESIA VS. NORMAL.

V. CONCLUSION

In this paper, we have proposed a statistical appearanceand motion modeling approach for directly detecting car-diac disease in echocardiographic sequences. We use ActiveShape Models to represent heart shape and texture appear-ance in individual frames. To represent motion, we constructdisease-specific PCA “eigenmotion” models based on thetracked motion of the ASM shape feature points. Diseaserecognition is posed as a model fitting problem where wefit the ASM and motion models trained from each diseaseto each new patient’s data and report the disease with bestaverage fit over viewpoints. Overall, our framework allowsus to integrate measures of shape, textural appearance, andmotion, and it provides a means to fuse evidence across dif-ferent echo viewpoints. Our initial experiments have shownthe effectiveness of this approach in discriminating betweenHypokinesia patients and normal patients. Future work willinvolve more extensive experiments on a large number ofdisease classes and viewpoints.

REFERENCES

[1] J. Bosch, S. Mitchell, B. Lelieveldt, F. Nijland, O. Kamp, M. Sonka,and J. Reiber. Automatic segmentation of echocardiographic sequencesby active appearance motion models. IEEE Transactions on MedicalImaging, 21(11):1374–1383, November 2002.

[2] T. Cootes and C. Taylor. Using grey-level models to improveactive shape model search. In International Conference on PatternRecognition, volume 1, pages 63–67, 1994.

[3] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shapemodels-their training and application. Comput. Vis. Image Underst.,61(1):38–59, 1995.

[4] I. Dydenko, F. Jamal, O. Bernard, J. D’hooge, I. E. Magnin, andD. Friboulet. A level set framework with a shape and motion priorfor segmentation and region tracking in echocardiography. MedicalImage Analysis, 10(2):162–177, 2006.

[5] A. Montillo, D. N. Metaxas, and L. Axel. Automated model-basedsegmentation of the left and right ventricles in tagged cardiac MRI.In MICCAI, pages 507–515, 2003.

[6] G. Nayler, N. Firmin, and D. Longmore. Blood flow imaging by cinemagnetic resonance. J. Comp. Assist. Tomog., 10:715–722, 1986.

[7] X. Papademetris, A. J. Sinusas, D. P. Dione, and J. Duncan. 3d cardiacdeformation from ultrasound images. In MICCAI, pages 420–429,1999.

[8] N. J. Pelc. Myocardial motion analysis with phase contrast cine MRI.In Proceedings of the 10th Annual SMRM, 1991.

[9] T. Syeda-Mahmood, F. Wang, D. Beymer, M. London, and R. Reddy.Characterizing spatio-temporal patterns for disease discrimination incardiac echo videos. In MICCAI, pages 261–269, 2007.

[10] T. Syeda-Mahmood and J. Yang. Characterizing normal and abnormalcardiac echo motion patterns. In Computers in Cardiology, pages725–728, 2006.

[11] A. Wong, P. Shi, H. Liu, and A. Sinusas. Joint analysis of heartgeometry arid kinematics with spatiotemporal active region model. InEMBC, pages 762–765, 2003.

[12] E. Zerhouni, D. Parish, W. Rogers, A. Yang, and E. Shapiro. Humanheart: Tagging with MR imaging – a method for noninvasive assess-ment of myocardial motion. Radiology, 169(1):59–63, 1988.

4788

[ieee 2008 30th annual international conference of the ieee engineering in medicine and biology...

Documents