chapter 13 combining radiology images and clinical … · combining radiology images and clinical...

16
Chapter 13 Combining Radiology Images and Clinical Metadata for Multimodal Medical Case-Based Retrieval Oscar Jimenez-del-Toro, Pol Cirujeda and Henning Müller Abstract As part of their daily workload, clinicians examine patient cases in the process of formulating a diagnosis. These large multimodal patient datasets stored in hospitals could help in retrieving relevant information for a differential diagnosis, but these are currently not fully exploited. The VISCERAL Retrieval Benchmark organized a medical case-based retrieval algorithm evaluation using multimodal (text and visual) data from radiology reports. The common dataset contained patient CT (Computed Tomography) or MRI (Magnetic Resonance Imaging) scans and RadLex term anatomy–pathology lists from the radiology reports. A content-based retrieval method for medical cases that uses both textual and visual features is presented. It defines a weighting scheme that combines the anatomical and clinical correlations of the RadLex terms with local texture features obtained from the region of interest in the query cases. The visual features are computed using a 3D Riesz wavelet texture analysis performed on a common spatial domain to compare the images in the analogous anatomical regions of interest in the dataset images. The proposed method obtained the best mean average precision in 6 out of 10 topics and the highest number of relevant cases retrieved in the benchmark. Obtaining robust results for various pathologies, it could further be developed to perform medical case-based retrieval on large multimodal clinical datasets. O. Jimenez-del-Toro (B ) · H. Müller Institute of Information Systems, University of Applied Sciences Western Switzerland Sierre (HES-SO), Sierre, Switzerland e-mail: [email protected] H. Müller University Hospitals of Geneva, Geneva, Switzerland e-mail: [email protected] P. Cirujeda Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, Spain e-mail: [email protected] © The Author(s) 2017 A. Hanbury et al. (eds.), Cloud-Based Benchmarking of Medical Image Analysis, DOI 10.1007/978-3-319-49644-3_13 221

Upload: trankiet

Post on 09-Apr-2018

219 views

Category:

Documents


2 download

TRANSCRIPT

Chapter 13Combining Radiology Images and ClinicalMetadata for Multimodal MedicalCase-Based Retrieval

Oscar Jimenez-del-Toro, Pol Cirujeda and Henning Müller

Abstract As part of their daily workload, clinicians examine patient cases in theprocess of formulating a diagnosis. These large multimodal patient datasets storedin hospitals could help in retrieving relevant information for a differential diagnosis,but these are currently not fully exploited. The VISCERAL Retrieval Benchmarkorganized amedical case-based retrieval algorithm evaluation usingmultimodal (textand visual) data from radiology reports. The common dataset contained patient CT(Computed Tomography) or MRI (Magnetic Resonance Imaging) scans and RadLexterm anatomy–pathology lists from the radiology reports. A content-based retrievalmethod for medical cases that uses both textual and visual features is presented. Itdefines a weighting scheme that combines the anatomical and clinical correlationsof the RadLex terms with local texture features obtained from the region of interestin the query cases. The visual features are computed using a 3D Riesz wavelettexture analysis performed on a common spatial domain to compare the imagesin the analogous anatomical regions of interest in the dataset images. The proposedmethod obtained the best mean average precision in 6 out of 10 topics and the highestnumber of relevant cases retrieved in the benchmark. Obtaining robust results forvarious pathologies, it could further be developed to perform medical case-basedretrieval on large multimodal clinical datasets.

O. Jimenez-del-Toro (B) · H. MüllerInstitute of Information Systems, University of Applied Sciences Western SwitzerlandSierre (HES-SO), Sierre, Switzerlande-mail: [email protected]

H. MüllerUniversity Hospitals of Geneva, Geneva, Switzerlande-mail: [email protected]

P. CirujedaDepartment of Information and Communication Technologies,Universitat Pompeu Fabra, Barcelona, Spaine-mail: [email protected]

© The Author(s) 2017A. Hanbury et al. (eds.), Cloud-Based Benchmarkingof Medical Image Analysis, DOI 10.1007/978-3-319-49644-3_13

221

222 O. Jimenez-del-Toro et al.

13.1 Introduction

As part of their daily workload, clinicians have to visualize and interpret a largenumber of medical images and radiology reports [17]. In recent years, the volumeof images in medical records has increased due to the continuous development ofimaging modalities and storage capabilities in hospitals [18]. Going through theselarge amounts of data is time-consuming and not scalable with the current trend ofbig data analysis [16]. Therefore, the challenge to make an efficient use of these largedatasets and to provide useful information for the diagnostic decisions of cliniciansis of high relevance [19]. It is of particular significance to effectively combine theinformation contained both in the patients’medical imaging and the clinicalmetadatafrom their reports [14].

It is now common for research groups to test their retrieval algorithms on a privatedataset, impeding the repeatability of their results and comparison to other algo-rithms [7]. The Visual Concept Extraction Challenge in Radiology (VISCERAL)project was developed as a cloud-based infrastructure for the evaluation of medicalimage analysis techniques on large datasets [16]. Through evaluation campaigns,challenges, benchmarks and competitions, tasks of general interest can be selectedto compare the algorithms on a large scale. One of these tasks is the Retrieval Bench-mark, which aims to find cases with similar anomalies based on query cases [11].

In this paper, a multimodal (text and visual) approach for medical case-basedretrieval is presented. It uses the RadLex [15] terminology and the 3D texture fea-tures extracted from medical images to objectively compare and rank the relevanceof medical cases for a differential diagnosis. Via an estimation of anatomical regionsof interest in the spatial domain delineated in medical images, it exploits the visualsimilarities in the 3D patient scans to improve the baseline text rankings [10]. Theimplementation of themethod, set up and results in the VISCERALRetrieval Bench-mark and lessons learned are explained in the following sections.

13.2 Materials and Methods

The proposed approach to retrieve relevant medical cases was based on a weightingscore scheme that combined RadLex terms’ anatomical and pathological correlationswith local visual information.

The VISCERAL Retrieval dataset on which this method was implementedand tested is initially addressed. The clinical metadata weighting scheme is thenexplained. Afterwards, the various image processing techniques used for the visualfeature extraction approach are shown. Finally, the fusion of both data, RadLex termlists and 3D patient scans, is explained in the multimodal fusion section.

13 Combining Radiology Images and Clinical Metadata … 223

13.2.1 Dataset

The Retrieval benchmark dataset was composed of patient scans (3D volumes) andRadLex anatomy–pathology term lists. The 2311 images in the dataset were obtainedduring clinical routine from two data providers.1 The dataset had a heterogeneouscollection of images including computed tomography (CT) and magnetic resonanceimaging (MRI) T1- and T2-weighted imaging, enhanced and unenhanced, in differ-ent fields of view (e.g. abdomen, whole body). For 1813 cases, RadLex anatomy–pathology term lists were generated automatically from the radiology reports of theimages. They included the affected anatomical structures and their RadLex term ID,the pathologies and their Radlex term ID, and whether the findings were negated ornot in the report. The number of findings and anatomical structures involved variedfrom case to case.

13.2.2 VISCERAL Retrieval Benchmark Evaluation Setup

Ten query topics, not included in the dataset, were distributed to the participants forthe evaluation of their retrieval algorithms. The goal of the benchmark was to detectand rank relevant cases in the dataset that could potentially aid in the process ofdiagnosing the query cases. Each query topic was composed of the following data:

• List of RadLex anatomy–pathology terms from the radiology report• 3D patient scan (CT or MRT1/MRT2)• Manually annotated 3D mask of the main organ affected• Manually annotated 3D region of interest (ROI) from the radiologist’s perspective

Participants submitted their rankings andmedical experts performed relevance judge-ments on the submitted cases to determine if they were relevant for the diagnosis ofeach of the query topics.

13.2.3 Multimodal Medical Case Retrieval

13.2.3.1 Text Retrieval

Given a set of N medical cases C = 〈R1, . . . ,RN ;V1, . . . ,VN ;M1, . . . ,MN ;F1, . . . ,FN 〉 where the textual information from a radiological report Rn contains alist L of the anatomies A and pathologies P present in the medical caseCn. The visualinformation (Vn,Mn,Fn) includes a triple of 3D volumes, containing the patient vol-ume Vn, binary label organ mask (annotation)Mn and binary label region of interest(annotation) Fn.

1http://www.visceral.eu/benchmarks/retrieval-benchmark/, as of 15 July 2016.

224 O. Jimenez-del-Toro et al.

The aim is to create a ranking S of relevant cases S = 〈C1, . . .CS〉 useful for adifferential diagnosis with the target case CT . Each case Cn is evaluated accordingto its radiology report Rn and visual information (Vn,Mn,Fn), and a final score Aranks the set of cases according to their relevance weight w.

S′T = (C1(w1),C2(w2), . . . ,Cn(wn)) (13.1)

The correlations were computed with the RadLex term lists provided from the radi-ology reports. Each similarity feature had a different weight in the final decision forthe differential diagnosis and retrieval of cases. The textual similarity between twocases was computed according to the following correlations and their correspondentweighting score (in brackets):

1. Same anatomy with same pathology [0.6]2. Same anatomy with same pathology negated [0.55]3. Same anatomy present multiple times [0.2]4. Same anatomy mentioned once [0.1]5. Same pathology with different anatomy [0.05]6. Similar anatomies [0.05]7. Same imaging modality [0.02]

The similarity features were defined using a heuristic approach, after a medicalexpert reviewed a subset of the RadLex term lists from randomly selected cases inthe Retrieval dataset. The selected criteria were optimized on the subset cases andthe clinical expertise of the medical expert. The aim of the weightings is to identifyand highlight clinical features that could be relevant for a differential diagnosis andincorporate a priori knowledge of the types of image scans contained in the dataset.The ranking was performed by adding all the weights from the different similarityfeatures for each case based on their correspondingRadLex term list. An independentscorewasgenerated for each case in theRetrieval dataset. Todefine similar anatomies,a list of correlating RadLex terms (e.g. lung, superior lobe, pleura...) was manuallygenerated by amedical expert from the standard RadLex term hierarchy on the subsetof randomly selected cases.2 These lists were generated for each of the query topicsin the benchmark.

13.2.3.2 Helping Multimodal Retrieval with Visual Texture Features

Multimodal retrieval can be influencedby common imageprocessing techniques usedin template matching or visual likelihood metrics for content-based image retrieval.Computer vision research areas such as image classification and pattern recognitionfrom visible features such as colour, contours or texture have been present in recentapproaches for the retrieval of medical cases with likely affected organs, imagemodalities or diagnosis [6].

2http://www.RadLex.org, as of 15 July 2016.

13 Combining Radiology Images and Clinical Metadata … 225

Fig. 13.1 Second-order Riesz kernels R(n1,n2,n3) convolved with isotropic Gaussian kernels G(x)

This section defines a methodology for content-based image retrieval via a simi-larity measurement from texture-based visual cues. First, a region of interest from aquery image is characterized, thanks to its computed 3D Riesz wavelet coefficients.In order to deal with 3D structure and also to provide a more compact representa-tion, these features are translated into a particular descriptor space which arises frommodelling the covariancematrices of the coefficient observations within a volumetricregion, instead of keeping the whole set of feature values. This compact data repre-sentation is of crucial interest as it allows to translate both learning image templatesand unknown testing image candidates to a common space which can be used in adictionary-seeking fashion for visual-based retrieval.

13.2.3.3 3D Riesz Transform for Texture Features

Riesz filterbanks are used in order to characterize the 3D texture of regions of interestin CT images. In previous work, 3D Riesz wavelets have demonstrated successfulperformance in the modelling task of subtle local 3D texture properties with highreproducibility compared to other methods [8, 9].

The N th order Riesz transformR(N) of a three-dimensional signal f (x) is definedin the Fourier frequency domain as:

R(n1,n2,n3)f (ω) =√n1 + n2 + n3n1!n2!n3!

(−jω1)n1(−jω2)

n2(−jω3)n3

||ω||n1+n2+n3f (ω), (13.2)

for all combinations of (n1, n2, n3) with n1 + n2 + n3 = N and n1,2,3 ∈ N.Equation 13.2 yields

(N+22

)templates R(n1,n2,n3) and forms multi-scale filterbanks

when coupled with a multi-resolution framework.In order to achieve a three-dimensional representation, the second-order Riesz

filterbank (depicted in Fig. 13.1) is used and rotation invariance is obtained by locallyaligning the Riesz componentsR(n1,n2,n3) of all scales based on the locally prevailingorientation as presented in [3].

226 O. Jimenez-del-Toro et al.

13.2.3.4 Invariant Representation via 3D Covariance Descriptors

The choice of a particular set of features for an accurate texture description is asimportant as a representation that is able to yield invariance to scale, rotation orother spatial changes of the described region of interest. Riesz features are used inconjunction with a representation that takes into account their statistical distribution,leading to a compact and discriminative notation with several benefits for patternrecognition.

First, a spatial homogenization baseline is achieved by an indirect 3D spatialregistration, where a reference image is used to register all the images from thedataset and generate a common space domain for visual comparison. The referenceimage is obtained from a control case of a complete patient scan in order to providea complete alignment frame. Once a new image is provided as a query, it is firstregistered to the reference image and included in this rough alignment of the datasetimages. Then, a set of derived regions of interest is determined for each of the imagesin the dataset by directly transforming the same coordinates from theROI in the queryimage. See Fig. 13.2 for a scheme of this workflow.

Affine registration

Reference image Query image

x

Retrieval data set

Fig. 13.2 Finding the region of interest (ROI) from the query image in the dataset. The image withthe biggest size from the dataset was selected as the reference image. In order to have a commonspatial domain to compare the images, all the images from the dataset were registered in advanceto this reference image using affine registration (dashed blue arrows). With a new query, the queryimages were also registered to the reference image, and the provided binary mask for the ROI(yellow borders) was transformed using the coordinate transformation from the affine registrationof the query image. This procedure defined an indirect ROI (dashed yellow borders) in each of thedataset images to compare the visual similarities with the query image

13 Combining Radiology Images and Clinical Metadata … 227

The required registrations for this step were computed using the image registra-tion implementation from the Elastix software3 [13]. The quality of the registrationis iteratively evaluated in each optimization of a cost function that aims to minimizethe normalized cross correlation from the voxel intensities of the transformed mov-ing image to the fixed target image. Using affine registration, the 3D volumes areglobally aligned through an iterative stochastic gradient descent optimizer with amulti-resolution approach [12].

The steerability property [21] of Riesz features asserts that voxel intensity valuesare projected to the direction of maximum variability within the region of interest,thus providing a common reference space for all the observable tissue patterns.Therefore, features are guaranteed to be directionality invariant which, added to therotation-invariant representation explained below, adds an additional robustness tospatial changes in the proposed covariance descriptor framework.

By their construction, covariance descriptors are suitable for unstructured, abstracttexture characterization inside a region, regardless of spatial rigid transformationssuch as rotation, scale or translations [2]. This is due to a statistics-based represen-tation in which covariance is used as a measure of how several random variableschange together (3D Riesz texture features in this case) and used as a discriminativesignature of a region of interest. This notion translates the absolute feature space,which is sparse and high dimensional, to a meaningful lower dimensional spaceof feature covariances where regions with similar texture variabilities lie clusteredand differentiated. Furthermore, the construction of covariance descriptors in theirnatural shape as symmetric positive definite matrices adds an inherent analyticalmethodology: these matrices form a manifold which can be analysed by its owndefined Riemannian metrics [1] for the comparison of descriptor samples.

In order to formally define the 3DRiesz-covariance descriptors, a feature selectionfunction Φ(ct, v) is denoted for a given 3D CT volume v (in this approach, a single96 × 96 × 96 block generated using the centre of the bounding box surrounding themanually annotated mask of the main organ affected in each the query topics) as:

Φ(v) = {R(n1,n2,n3)

x,y,z , ∀x, y, z ∈ v}, (13.3)

which denotes the set of 6-dimensional Riesz feature vectors, as defined in Eq. 13.2,obtained at each one of the coordinates {x, y, z} contained in the volume cube v.

Then, for a given region v of the CT image, the associated covariance descriptorcan be obtained as:

Cov (Φ(v)) = 1

N − 1

N∑i=1

(Φ − μ) (Φ − μ)T , (13.4)

whereμ is the vector mean of the set of feature vectors {Φx,y,z}within the volumetricneighbourhood made of N = 963 samples. Figure 13.3 shows the construction of asample 3D Riesz-covariance descriptor.

3http://elastix.isi.uu.nl, as of 20th October 2015.

228 O. Jimenez-del-Toro et al.

Fig. 13.3 Cues involved in the descriptor calculation for a given CT cubic region. The initial cubedepicts the values within a 96 × 96 × 96 pixel volume with its CT intensities; the 6 central cubesdepict the 2nd order 3D-Riesz wavelet responses, and the Riesz norm is included as well. Thematrixin the right sub-figure depicts the resulting covariance descriptor, encoding the different correlationsbetween the distributions of the observed cues

13.2.3.5 Pattern Matching in the Sym+d Manifold

The resulting 6 × 6 covariance descriptors are symmetric matrices in which thediagonal elements represent the variance of each Riesz feature, and the non-diagonalelements represent their pairwise covariance. As previously stated, these descriptorsare used as discriminative signatures of the texture patterns found in the block v.3D Riesz-based covariance descriptors do not only provide a representative entity,but they also lie in the Riemannian manifold of symmetric definite positive matricesSym+

d . The spatial distribution of the descriptor space is geometrically meaningful as3D regions sharing similar texture characteristics remain clustered when descriptorsimilarity is computed by means of the Riemannian metrics defined for this non-Euclidean spatial distribution, as defined below. This is depicted in Fig. 13.4, wheremulti-dimensional scaling is used for projecting the descriptor space into a two-dimensional plot for visualization. The same notion can be used for feature selectionor dimensionality reduction in the nonlinear descriptor space.

According to [1], the Sym+d Riemannian manifold constituting the covariance

descriptor space can be approximated in close neighbourhoods by the Euclideanmetric in its tangent space,TY , where the symmetricmatrixY is a reference projectionpoint in the manifold. TY is formed by a vector space of d × d symmetric matrices,and the tangent mapping of a manifold element X to x ∈ TY is made by the point-dependent logY operation:

x = logY (X) = Y12 log

(Y− 1

2XY− 12

)Y

12 . (13.5)

13 Combining Radiology Images and Clinical Metadata … 229

−20 −15 −10 −5 0 5 10 15 20 25−30

−20

−10

0

10

20

302D descriptor space embedding

X

Y

KidneyLiverLungPancreasurinaryBladder

Fig. 13.4 Set of image descriptors obtained from 5 organ textures, belonging to 200 differentcubic samples from various patient CT scans. The descriptors for each class are plotted in differentcolours in the embedded two-dimensional space, via the multi-dimensional scaling dimensionalityreduction technique, according to the descriptor similarity metric defined in Eq. 13.8. This plotdemonstrates geometrical coherence as the class distribution is correlated in the descriptor space:areas with different texture features, such as liver, lung or urinary bladder, appear clustered in thedescriptor space. Some areas that share texture features, such as pancreas appear more overlapped toother regions. In any case, this descriptor space can be used in linear or nonlinear machine learningclassification methods for texture modelling

As a computational approximation in certain classification problems, the projec-tion point can be established in a common point such as the Identity matrix, andtherefore, the tangent mapping becomes:

log(X) = Ulog(D)U ′, (13.6)

where U and D are the elements of the single value decomposition (SVD) ofX ∈ Sym+

d .One property of the projected symmetric matrices in the tangent space TY is

that they contain only d(d + 1)/2 independent coefficients, in their upper or lowertriangular parts. Therefore, it is possible to apply the vectorization operation in orderto obtain a linear orthonormal space for the independent coefficients:

x = vect(x) = (x1,1, x1,2, . . . , x1,d, x2,2, x2,3, . . . , xd,d), (13.7)

where x is the mapping of X ∈ Sym+d to the tangent space, resulting from Eq. 13.5.

The obtained vector x lies in the Euclidean space R

m, where m = d(d + 1)/2. Thiscan be used for efficient template storage in cases of big data volumes.

230 O. Jimenez-del-Toro et al.

This set of operations is useful for data visualization, feature selection and fordeveloping machine learning and classification techniques on top of the particulargeometric space of the proposed covariance descriptors. The tangent mapping oper-ator can be taken into account leading to the following Riemannian metric, whichexpresses the geodesic distance between two points X1 and X2 on Sym

+d [1]:

δ(X1,X2) =√Trace

(log

(X

− 12

1 X2X− 1

21

)2)

, (13.8)

or more simply δ(X1,X2) =√∑d

i=1 log(λi)2, where λi are the positive eigenvalues

of X− 1

21 X2X

− 12

1 .Therefore, in a similarity retrieval application in which a query region obtained

covariance descriptorQ has to bematched against a set of template region descriptors{Ti} belonging to different classes, this distance can be used as a supporting metricfor a weighted scoring system for multimodal retrieval:

class(Q) = argmini

{δ(Q,Ti) ∀i ∈ T} , (13.9)

since the dimensionality of the proposed descriptors is very compact, this scoringfunction is computationally feasible for datasets of reasonable sizes.

13.2.3.6 Multimodal Fusion

It is known from previous medical case-based retrieval benchmarks that the textqueries obtain much better results than the visual queries [4, 5]. This has beenattributed to the currently much more consistent representation of clinical signs inmedical images by text labels than by their visual features that are not always veryspecific. Therefore, it is of high interest to the retrieval information community tofind robust visual features that can be combined with semantic terms [14]. To includethe information obtained from the visual ranking of the cases into the semantic textweighting scheme, we give an additional weighting if the visual similarity score ishigh. The additional weight [0.05] is added to the total sum from the textual scoreof the case, if it is in the top 20% of the ranking obtained from the similarity scoreof the covariance descriptor. These parameters were manually optimized using asmall subset of the dataset. A medical expert provided a list of correlation-basedsimilarities that are of interest for finding relevant cases in the dataset. For each ofthe query topics, a single main combination of anatomy and pathology RadLex termswas manually selected from RadLex term list. This decision was based on the regionof interest and organ mask provided in the benchmark to the participants.

13 Combining Radiology Images and Clinical Metadata … 231

13.3 Results

Only one run was submitted for the VISCERAL Retrieval benchmark 2015, whichcombined both the RadlexID weighting score scheme and the visual texture features.This run contained a ranking of 300 cases for each of the ten query topics in thebenchmark. The cases were ranked in descending order according to the computedsimilarity to the query topic.

Theproposed approachobtained thehighestmean averageprecision (MAP) scoresin 6 out of the 10 query topics [11]. The topic with the highest MAP was topic 08—kidney cyst—with 0.5131, and the lowest was topic 10—rib fracture—with 0.0467.ThemeanMAPwas 0.2367which was the second bestMAP score of the benchmark.

Although the precision from 10 (P_10), 20 (P_20) and 30 (P_30) documentsretrieved is lower than themethod by Spanier et al. [20], ourmethod presented amorestable decline of precision scores obtaining the benchmark top scores for 100 (P_100)and 200 (P_200) documents retrieved (see Fig. 13.5).Moreover, the proposedmethodobtained the highest total of relevant documents retrieved (num_rel_ret): 1077 outof a maximum of 2462. In 7 of the 10 query topics, it obtained the top numberof total documents retrieved out of all the different 21 runs in the benchmark (seeFig. 13.6). The mean average precision (MAP), precision after query relevant casesretrieved (Rprec), binary preference (bpref), precision after 10 cases retrieved (P_10)and precision after 30 cases retrieved (P_30) from our method are shown per querytopic in Table 13.1. These results take special interest when compared against theother retrieval methods proposed in order to identify which components receive aparticular benefit when a multimodal-based scoring is introduced. There is a clearadvantage of the method by Spanier et al. in query topic 10 when compared to ourmethod. This topic, with radiological diagnosis of rib fracture, had only 47 cases

Fig. 13.5 Line graph showing the mean precision scores over all the topics at varying number ofcases retrieved: 10–200. The best run was selected per participant considering all possible tech-niques: only text, only visual or mixed. A maximum of 300 cases could be included in each of thesubmitted rankings per topic

232 O. Jimenez-del-Toro et al.

Fig. 13.6 Box plot chart with the total number of relevant cases retrieved (num_rel_ret) per topicin the VISCERAL Retrieval benchmark. The method proposed in this paper is represented with redsolid bars. The results of the other participants, including text, visual and mixed runs are shownas white boxes. The horizontal lines inside the boxes mark the median number of relevant casesretrieved. Each box extends from the first to the third quantile of the run results

considered as relevant by the relevance judgements. This is one of the topics in theRetrieval Benchmark with fewer relevant cases, making it harder to select only a fewrelevant cases from the complete dataset. On the other hand, our method was theonly run with a mixed technique (text and visual) that produced a ranking for all ofthe query topics available, unlike the approach from Spanier et al.

13.3.1 Lessons Learned

Having a common dataset is fundamental to make objective comparisons betweendifferent retrieval methods. There were two topics (07 and 09), where techniquesusing only text data performed better than the mixed techniques. Otherwise, multi-modal techniques in the benchmark overall obtained the best scores.

An advantage of scanning a large dataset of patient cases is that, like in a realclinical scenario, the distribution of diseases is not uniform. This requires a robustselection of relevant features for a successful retrieval, particularly for those diseaseswith few cases in the dataset.

Visual retrieval is still a complementary technique that is best used with a strongbaseline of text-related similarities betweenmedical cases. Further research is neededto detect the most relevant region of interest in the images as well as the best visualfeatures per topic. Manual annotation of the regions of interest in the medical imagescan be useful to improve even further this technique by obtaining more targetedvisual features related to a specific medical case. This would avoid sampling largeregions in the image and generate a more robust training set on which to build

13 Combining Radiology Images and Clinical Metadata … 233

Table13.1

Resultsforthe10

querytopics

intheVISCERALRetrievalbenchm

arkusingtheprop

osed

multim

odal(textand

visual)retrievalapp

roach

Metric

0102

0304

0506

0708

0910

All

MAP

0.2293

0.2227

0.2227

0.2497

0.1949

0.3883

0.1780

0.5131

0.1212

0.0467

0.2367

Rprec

0.4576

0.3575

0.3575

0.3106

0.3508

0.4985

0.3483

0.6399

0.1667

0.0851

0.3572

bpref

0.5035

0.3466

0.3466

0.4047

0.3542

0.4912

0.3444

0.6307

0.1580

0.0837

0.3664

P10

0.2000

0.6000

0.6000

0.8000

0.7000

0.9000

0.6000

0.8000

0.3000

0.2000

0.5700

P30

0.5000

0.5333

0.5333

0.8000

0.5667

0.8667

0.7000

0.8000

0.1333

0.1000

0.5533

234 O. Jimenez-del-Toro et al.

retrieval algorithms. However, this implies a significant increase in the workload ofthe clinicians when handling these datasets.

Although this method was developed for the VISCERAL Retrieval Benchmarktasks and dataset, both the clinical correlations and the general approach for obtainingrelevant visual features can be implemented for similar clinical tasks. Nevertheless,the results obtained during the VISCERAL Retrieval Benchmark showed the advan-tage of combining multimodal information in the search for differential diagnosismedical cases. The semi-automatic method obtained the highest scores for themajor-ity of topics when compared to the other runs submitted in the Benchmark. It includesboth textual and visual information in the queries and managed to index a datasetof >2000 medical cases with radiology reports and 3D patient scans.

13.4 Conclusions

Asemi-automaticmultimodal (using text and visual information)medical case-basedretrieval approach is presented. A rule-based weighting of the anatomical and clin-ical RadLex term correlations from radiology reports is used as a baseline to finduseful clinical features from the cases. The results of the processing only text data(RadLex IDs) are further improved with state-of-the-art techniques (Riesz wavelets,image registration and covariance descriptors) to compute a visual similarity scorebetween the medical images in the cases. The method was implemented and testedin the VISCERAL Retrieval Benchmark 2015, with overall promising results for theretrieval of relevant cases for differential medical diagnosis. More work is needed toaddress the scalability of this approach and the inclusion of new clinical cases.

Acknowledgements This work was supported by the EU in FP7 through VISCERAL (318068),Khresmoi (257528) and the Swiss National National Foundation (SNF grant 205320–141300/1).

References

1. Arsigny V, Fillard P, Pennec X, Ayache N (2006) Log-Euclidean metrics for fast and simplecalculus on diffusion tensors. Magn Reson Med 56(2):411–421

2. Cirujeda P, Mateo X, Dicente Y, Binefa X (2014) MCOV: a covariance descriptor for fusion oftexture and shape features in 3D point clouds. In: International conference on 3D vision (3DV)

3. Depeursinge A, Foncubierta-Rodriguez A, Ville D,Müller H (2011) Lung texture classificationusing locally-oriented riesz components. In: Fichtinger G, Martel A, Peters T (eds) MICCAI2011. LNCS, vol 6893. Springer, Heidelberg, pp 231–238. doi:10.1007/978-3-642-23626-6_29

4. García Seco de Herrera A, Kalpathy-Cramer J, Demner Fushman D, Antani S, Müller H(2013) Overview of the ImageCLEF 2013 medical tasks. In: Working notes of CLEF 2013(Cross Language Evaluation Forum)

5. García Seco de Herrera A, Foncubierta-Rodríguez A, Müller H (2015) Medical case-basedretrieval: integrating query MeSH terms for query-adaptive multi-modal fusion. In: SPIE med-ical imaging. International Society for Optics and Photonics

13 Combining Radiology Images and Clinical Metadata … 235

6. García Seco de Herrera A, Müller H, Bromuri S (2015) Overview of the ImageCLEF 2015medical classification task. In: Working notes of CLEF 2015 (Cross Language EvaluationForum)

7. Hanbury A, Müller H, Langs G, Weber MA, Menze BH, Fernandez TS (2012) Bringing thealgorithms to the data: cloud–based benchmarking for medical image analysis. In: Catarci T,Forner P, Hiemstra D, Peñas A, Santucci G (eds) CLEF 2012. LNCS, vol 7488. Springer,Heidelberg, pp 24–29. doi:10.1007/978-3-642-33247-0_3

8. Jiménez del Toro OA, Foncubierta-Rodríguez A, Vargas Gómez MI, Müller H, Depeursinge A(2013) Epileptogenic lesion quantification in MRI using contralateral 3D texture comparisons.In: Mori K, Sakuma I, Sato Y, Barillot C, Navab N (eds) MICCAI 2013. LNCS, vol 8150.Springer, Heidelberg, pp 353–360. doi:10.1007/978-3-642-40763-5_44

9. Jiménez del Toro OA, Foncubierta-Rodríguez A, Depeursinge A, Müller H (2015) Textureclassification of anatomical structures in CT using a context-free machine learning approach.In: SPIE medical imaging 2015

10. Jiménez-del-Toro OA, Cirujeda P, Cid YD, Müller H (2015) RadLex terms and local texturefeatures for multimodal medical case retrieval. In: Müller H, Jimenez del Toro OA, Hanbury A,Langs G, Foncubierta Rodríguez A (eds) Multimodal retrieval in the medical domain. LNCS,vol 9059. Springer, Cham, pp 144–152. doi:10.1007/978-3-319-24471-6_14

11. Jiménez-del-Toro OA, Hanbury A, Langs G, Foncubierta–Rodríguez A, Müller H (2015)Overview of the VISCERAL retrieval benchmark 2015. In: Müller H, Jimenez del Toro OA,Hanbury A, Langs G, Foncubierta Rodríguez A (eds) Multimodal retrieval in the medicaldomain. LNCS, vol 9059. Springer, Cham, pp 115–123. doi:10.1007/978-3-319-24471-6_10

12. Klein S, Pluim JP, Staring M, Viergever MA (2009) Adaptive stochastic gradient descentoptimisation for image registration. Int J Comput Vis 81(3):227–239

13. Klein S, StaringM,Murphy K, ViergeverMA, Pluim JP (2010) Elastix: a toolbox for intensity-based medical image registration. IEEE Trans Med Imaging 29(1):196–205

14. Kurtz C, Depeursinge A, Napel S, Beaulieu CF, Rubin DL (2014) On combining visual andontological similarities for medical image retrieval applications. Med Image Anal 18(7):1082–1100

15. Langlotz CP (2006) RadLex: a new method for indexing online educational materials. Radi-ographics 26(6):1595–1597

16. Langs G, Hanbury A, Menze B, Müller H (2013) VISCERAL: towards large data in medicalimaging — challenges and directions. In: Greenspan H, Müller H, Syeda-Mahmood T (eds)MCBR-CDS 2012. LNCS, vol 7723. Springer, Heidelberg, pp 92–98. doi:10.1007/978-3-642-36678-9_9

17. Müller H, Michoux N, Bandon D, Geissbuhler A (2004) A review of content-based imageretrieval systems in medicine-clinical benefits and future directions. Int J Med Inform 73(1):1–23

18. Rubin GD (2000) Data explosion: the challenge of multidetector-row CT. Eur J Radiol36(2):74–80

19. Rubin D, Napel S (2010) Imaging informatics: toward capturing and processing semanticinformation in radiology images. Yearb Med Inform 2010:34–42

20. Spanier AB, Joskowicz L (2015) Medical case-based retrieval of patient records using theRadLex hierarchical lexicon. In: Müller H, Jimenez del Toro OA, Hanbury A, Langs G, Fon-cubierta Rodríguez A (eds) Multimodal retrieval in the medical domain. LNCS, vol 9059.Springer, Cham, pp 129–138. doi:10.1007/978-3-319-24471-6_12

21. Unser M, Van De Ville D (2010) Wavelet steerability and the higher-order Riesz transform.IEEE Trans Image Process 19(3):636–652

236 O. Jimenez-del-Toro et al.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution- Non-

Commercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which

permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium

or format, as long as you give appropriate credit to the original author(s) and the source, provide a

link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative

Commons license, unless indicated otherwise in a credit line to the material. If material is not

included in the chapter’s Creative Commons license and your intended use is not permitted by

statutory regulation or exceeds the permitted use, you will need to obtain permission directly from

the copyright holder.