deliverable 3it d3.3 2v1 final.pdfdue date of deliverable month 24 actual submission date month 27...

21
DELIVERABLE 3.3 Workflow implementation and streamlining for high- throughput image analysis of large-scale studies Grant agreement no.: 601055 (FP7-ICT-2011-9) Project acronym: VPH-DARE@IT Project title: Dementia Research Enabled by IT Funding Scheme: Collaborative Project Project co-ordinator: Prof. Alejandro Frangi, University of Sheffield Tel.: +44 114 22 20153 Fax: +44 114 22 27890 E-mail: [email protected] Project web site address: http://www.vph-dare.eu Due date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1 st 2013 Project duration 48 months Work Package & Task WP 3, Task 3.2, 3.3 Lead beneficiary UCL Editor PMO Author(s) Nicolas Toussaint, David Cash, Wyke Huizinga Quality reviewer Peter Metheral, Wiro Niessen Project co-funded by the European Union within the Seventh Framework Programme Dissemination level PU Public X PP Restricted to other programme participants (including Commission Services) RE Restricted to a group specific by the consortium (including Commission Services) CO Confidential, only for members of the consortium (including Commission Services)

Upload: others

Post on 08-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

DELIVERABLE 3.3

Workflow implementation and streamlining for high-

throughput image analysis of large-scale studies

Grant agreement no.: 601055 (FP7-ICT-2011-9)

Project acronym: VPH-DARE@IT

Project title: Dementia Research Enabled by IT

Funding Scheme: Collaborative Project

Project co-ordinator: Prof. Alejandro Frangi, University of Sheffield

Tel.: +44 114 22 20153

Fax: +44 114 22 27890

E-mail: [email protected]

Project web site address: http://www.vph-dare.eu

Due date of deliverable Month 24

Actual submission date Month 27

Start date of project April 1st 2013

Project duration 48 months

Work Package & Task WP 3, Task 3.2, 3.3

Lead beneficiary UCL

Editor PMO

Author(s) Nicolas Toussaint, David Cash, Wyke

Huizinga

Quality reviewer Peter Metheral, Wiro Niessen

Project co-funded by the European Union within the Seventh Framework Programme

Dissemination level

PU Public X

PP Restricted to other programme participants (including Commission Services)

RE Restricted to a group specific by the consortium (including Commission

Services)

CO Confidential, only for members of the consortium (including Commission

Services)

Page 2: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 2 -

Issue Record

Version no. Date Author(s) Reason for

modification

Status

1.0 18/06/15 Nicolas Toussaint,

David Cash

Initial release Draft

2.0 30/06/15 Nicolas Toussaint,

David Cash

Reviewers

comments

Reviewed

2.1 05/07/15 PMO Final check Finalised

Copyright Notice

Copyright © 2013 VPH-DARE@IT Consortium Partners. All rights reserved. VPH-

DARE@IT is an FP7 Project supported by the European Union under grant agreement no.

601055. For more information on the project, its partners, and contributors please see

http://www.vph-dare.eu. You are permitted to copy and distribute verbatim copies of this

document, containing this copyright notice, but modifying this document is not allowed. All

contents are reserved by default and may not be disclosed to third parties without the prior

written consent of the VPH-DARE@IT consortium, except as mandated by the grant agreement

with the European Commission, for reviewing and dissemination purposes. All trademarks and

other rights on third party products mentioned in this document are acknowledged and owned

by the respective holders. The information contained in this document represents the views of

VPH-DARE@IT members as of the date of its publication and should not be taken as

representing the view of the European Commission. The VPH-DARE@IT consortium does not

guarantee that any information contained herein is error-free, or up to date, nor makes

warranties, express, implied, or statutory, by publishing this document.

Author(s) for Correspondence

Nicolas Toussaint, PhD

[email protected]

University College London

Translational Imaging Group,

Centre for Medical Image Computing

3rd Floor, Wolfson House,

London NW1 2HE

United Kingdom

Page 3: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 3 -

TABLE OF CONTENTS

1. INTRODUCTION ......................................................................................................................... 5

2. BIOMARKER PIPELINE DESIGN ............................................................................................ 5

2.1. GENERAL PIPELINE SPECIFICATIONS ........................................................................................ 6 2.2. INVENTORY OF IMAGE ANALYSIS TOOLS .................................................................................. 7 2.3. IMAGE ANALYSIS WORKFLOW BLOCKS ................................................................................... 7 2.4. INCORPORATION INTO VPH-DARE RESEARCH PLATFORM ..................................................... 7

3. PER BIOMARKER PIPELINE IMPLEMENTATION ............................................................ 9

3.1. BIAS CORRECTION ................................................................................................................... 9 3.1.1. Evaluation on testing Set ................................................................................................ 9

3.2. WHOLE BRAIN PARCELLATION AND TISSUE SEGMENTATION ................................................ 10 3.2.1. Testing Set .................................................................................................................... 11

3.3. HIPPOCAMPAL VOLUME PROFILE .......................................................................................... 12 3.3.1. Evaluation on AD / control testing set ......................................................................... 14

3.4. DIFFUSION PROCESSING ......................................................................................................... 15 3.4.1. Evaluation on ADNI retrospective cohort .................................................................... 18

4. BIOMARKER EXTRACTION ROADMAP ............................................................................ 19

5. CONCLUSIONS .......................................................................................................................... 20

6. REFERENCES ............................................................................................................................ 20

7. ANNEXES .................................................................................................................................... 21

7.1. WORKFLOW ............................................................................................................................ 21

Page 4: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 4 -

TABLE OF FIGURES

Figure 1: Diagram of the Nipype workflow environment (http://nipy.sourceforge.net/nipype).

Distinct image analysis tools are embedded into interfaces that constitute blocks of a workflow

seamlessly executable and fully reproducible. .......................................................................... 7 Figure 2: Data and information flow in the research platform. The pipelines are stored in the

platform and are at the disposition of the user. The platform facilitates the application of

validated biomarker extraction workflows on large cohorts for high throughput analysis. ...... 8 Figure 3: Correlation of BSI measures when using N3 or N4 for bias correction as pre-

processing, presented as Bland-Altman plots. ......................................................................... 10 Figure 4: Brain Parcellation workflow. A database is used to propagate template labels to the

target input structural image .................................................................................................... 11 Figure 5: Example of whole brain parcellation on one of the 1000 subjects of the RSS. ...... 12 Figure 6: Top: parcellation showing the hippocampus in red. Bottom: distribution of the

hippocampus volume as percentage of intracranial volume as function of age. ..................... 12 Figure 7: hippocampus segmentations (left) with estimated left and right long axes (middle and

right) superimposed on structural T1 image. ........................................................................... 13 Figure 8: (left) Kernel density estimation in 1D (black line) with sparse data points (blue

markers). The blue dots represent the data points and the black continuous line represents the

density estimation. (right) Kernel density estimation is applied along the principal axis of the

hippocampus. ........................................................................................................................... 13 Figure 9: Two examples of output profiles from a healthy subject (left) and a patient suffering

from Alzheimer’s disease (right). The graph shows the left (red) and right (green) hippocampal

volume profiles. ....................................................................................................................... 14 Figure 10: Volume Profile Generation workflow. Kernel density estimation is used to estimate

the continuous function from sparse voxel volumes projected along the hippocampus axis. . 14 Figure 11: Hippocampal volume (in TIV percentage) profile on a population of 27 controls

(grey) and 43 AD patients (black) between the anterior and the posterior part. ...................... 15 Figure 12: Workflow for diffusion weighted imaging data. The diffusion-weighted images are

linearly registered to the average B0 image. Field maps and T1 images are used to estimate

susceptibility distortion. The resulted corrected images are used for tensor fitting. ............... 17 Figure 13: The diffusion-processing pipeline outputs maps depicting the white matter

arrangement. (Left) T1 weighted image. (Right) corresponding FA map colour-coded with

tissue orientation. ..................................................................................................................... 17 Figure 14: Quality Control graphs for Diffusion MRI. (Top) DWI image showing some

significant signal dropouts. (Bottom) corresponding inter-slice cross-correlation for B0 (red)

and DWI (blue) images, where the problematic volume is automatically detected. ............... 18 Figure 15: Inter-slice cross-correlation graphs on 216 subjects of the ADNI cohort.

Thresholding allows the automatic detection of 13 outliers containing significant signal

dropouts. .................................................................................................................................. 19 Figure 16: The Taverna Workbench. During the integration process, the user needs to import

the newly created service (top-left panel) into the workbench (main panel) and connect inputs

and outputs of the pipeline (in green) with the DARE portal nodes (in blue). The pipeline can

then be run from the menu. ...................................................................................................... 21

Page 5: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 5 -

1. INTRODUCTION

Over the first two years of the project, VPH-DARE has collected numerous retrospective

imaging studies in dementia into a single repository represented by the VPH-SHARE

infostructure. This provides the ability to determine whether data and results from multiple

datasets can be pooled together in order to provide a better understanding of disease processes

and what factors (genetic, lifestyle, environmental) could influence them. Some of the most

well established biomarkers, as well as some of the most promising for early disease detection

and differential diagnosis, come from imaging. While many of these databases already have

been analysed before and contain certain derived imaging biomarkers for some datasets, each

database has employed different methodologies, software packages, and program settings to

obtain these values. Thus, it is important to extract the relevant imaging biomarkers from each

of these retrospective studies using standardised pipelines and to make them available to the

consortium for the purposes of mechanistic and phenomenological modelling done in WPs 5

and 6, as well as to provide normal and abnormal distributions to aid in diagnostic decisions as

part of the clinical platform being developed in WP8. This task represents a computationally

expensive endeavour, as there are multiple pipelines that require hours of computing time, and

tens of thousands of datasets from which the biomarkers need to be extracted. The VPH-DARE

research platform offers the opportunity to perform this extraction in a standardised and high-

throughput manner.

This deliverable has close ties with deliverable 3.1, in which we laid out the basic requirements

for key biomarkers we felt would be necessary for extraction, and deliverable 7.2, where we

presented methods for integrating these biomarker pipelines into the research platform. In this

document, we first focus on the design approach for constructing these biomarker pipelines

with the consideration that they will be used within the research platform. Then we discuss the

key biomarker pipelines: how we have optimised the pipeline parameters and any adjustments

that we have made to overcome various challenges that arose during the implementation. We

then show evidence that the biomarkers are performing as expected through the use of

validation test sets for each pipeline. Finally, we present the outline of a plan to complete

extraction of imaging biomarkers from the retrospective database as represented by Milestone

33.

2. BIOMARKER PIPELINE DESIGN

The goal of this deliverable and Milestone 33 is to extract imaging biomarkers from the

retrospective databases. This objective had strong implications in terms of the selection and

subsequent implementation of the pipelines. First, the most commonly used imaging

biomarkers in dementia research are based on high-resolution structural MRI using volumetric

T1-weighted imaging. These biomarkers provide quantitative volumetric assessments of key

brain structures and the longitudinal rate of change in volume of these structures. These are the

most well established biomarkers, with evident changes just before symptom onset and a strong

correlation with clinical severity. All of the retrospective studies in VPH-DARE that contain

imaging have some form of volumetric T1-weighted imaging as part of the protocol. As a result,

we decided to primarily focus on extracting biomarkers from these images. Second, there are

numerous publicly available software packages available to perform the image processing

needed to obtain these biomarkers, and we wanted our design to be flexible and interoperable

between these packages, additionally to tools developed in-house, so that the pipeline could

take the best components from each. Finally, we want to provide results to the end users that

are clear and reproducible. We felt that this would involve providing reasonable provenance

information (software versions, computer hardware, etc.) as well as one “validated” version of

the pipeline and corresponding end result rather than multiple versions with different settings.

Page 6: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 6 -

2.1. GENERAL PIPELINE SPECIFICATIONS

Current neuroimaging software provides a large variety of analysis tools that have been largely

approved by the scientific community. A non-exhaustive list of tools commonly used in the

neuroimaging community is presented below:

- FSL (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki) is a library and software suite containing a

large panel of basic and complex image processing and manipulation tools

- SPM (http://www.fil.ion.ucl.ac.uk/spm) is a library dedicated to statistical analysis of

imaging data

- FreeSurfer (http://surfer.nmr.mgh.harvard.edu) is a software suite especially used for

cortical surface study

- Camino (http://cmic.cs.ucl.ac.uk/camino) is a software for diffusion image processing

- Slicer (http://slicer.org) is a generic software package for medical image computing

- ANTS (http://stnava.github.io/ANTs) is a processing library used for image

registration and segmentation

It is therefore crucial to take advantage of this repertoire of existing tools. The choice of the

workflow implementation should therefore be driven by its ability to facilitate the integration

of such tools. Additionally, it is crucial for large studies that the workflows aiming at the

extraction of imaging biomarkers share some degree of reproducibility. Furthermore, common

image processing techniques such as registration and segmentation will be used numerous times

in different workflow. This requires a workflow environment facilitating the transfer of

processing blocks from workflow to workflow.

Such an environment could be achieved using basic shell scripts or by programming within the

command line binaries. This would require additional time in terms of implementation,

correctly identifying interoperability between software packages, managing the versions of

these packages, and recording this information so that it could be saved with the resulting

outputs. In some cases, it would be more sensible to do this when integrating into the research

platform. However, Nipype (http://nipy.sourceforge.net/nipype) is a Python based workflow

engine that is well adapted for neuroimaging studies, incorporates many of these ancillary

capabilities, and thereby comes as a natural choice for the implementation of the biomarker

extraction pipelines so that they could be incorporated as a major part of VPH-DARE@IT

image analysis workflows.

Page 7: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 7 -

Figure 1: Diagram of the Nipype workflow environment (http://nipy.sourceforge.net/nipype).

Distinct image analysis tools are embedded into interfaces that constitute blocks of a workflow

seamlessly executable and fully reproducible.

2.2. INVENTORY OF IMAGE ANALYSIS TOOLS

As neuroimaging is a mature area of research, many software tools to analyse and process

neuroimaging data have emerged throughout the past decades.

Partners within the consortium have a strong background in imaging software and in-house

analysis tools play an important role in the implementation of biomarker’s extraction

workflows. Amongst them, NifTK [1] is an in-house medical imaging software suite that

contains numerous command line applications for automated image-processing tasks such as

registration and segmentation.

2.3. IMAGE ANALYSIS WORKFLOW BLOCKS

Most tools from the libraries described in Sec. 2.2 have already their interface integrated in the

Nipype environment. Therefore, all FSL, FreeSurfer, SPM, and ANTS binaries can be used as

building blocks. A large effort has been taking place in order to implement interfaces for

additional in-house tools contained within the NifTK library. Amongst them the following three

interfaces are commonly used in the implementation of the image-based biomarker’s extraction

workflows:

- NiftyReg (http://sourceforge.net/projects/niftyreg) contains programs to perform rigid,

affine and non-linear registration for medical images

- NiftySeg (http://sourceforge.net/projects/niftyseg) is dedicated to EM based

segmentation and label fusion algorithms for medical images

- NiftyFit (closed-source) provides a selection of routines for model fitting to different

types of MRI data, especially used for diffusion tensor fitting

2.4. INCORPORATION INTO VPH-DARE RESEARCH PLATFORM

High throughput analysis of these biomarker pipelines will be achieved by incorporating the

pipelines within the VPH-DARE@IT research platform, resulting in the data, derived results,

and pipelines for all projects to exist in a single location allowing for cross-project queries and

analysis. For the purposes of the biomarker extraction from retrospective database, there was a

large design question. Should the workflows be incorporated into the research platform is

Page 8: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 8 -

individual atomic units, and then constructed within the research platform? Or should they be

imported as a complete “black box” tested outside of the research platform before use for this

task or by other users? Some users may want to have access to these atomic units in order to

build novel workflows up from scratch on the research platform. For the purpose of biomarker

extraction from retrospective databases, we opted for the latter approach of creating complete

“black box” pipelines for many reasons. First, we believe that the research platform, and

specifically these biomarkers, should be geared towards end users/researchers who are more

clinically oriented and are interested in using or generating established biomarkers over large

studies, rather than technical end-users who want to further develop or optimise pipelines for

their analysis. These technical end users could generate numerous pipelines and values resulting

from these pipelines that are only slightly different, which could lead to confusion from other

users. Second, it is more efficient to test and validate one complete biomarker pipeline off-line

and incorporate into the research platform compared to a incorporating all of its individual

component atomic units. Efficiency is important not just in execution of these pipelines, but

also in design, as this will be on the critical path for biomarker extraction.

The research platform and its components have been described in D7.1 and D7.2. The platform

dedicates appropriate resource from virtual machines to complete a user-specific task. The user

chooses a set of data to perform the task on, and selects a workflow dedicated to the extraction

of a specific biomarker. The key biomarker extraction pipelines and the settings used for testing

are described in the next section. They have been integrated in the virtual machine template in

order for the research platform to easily access them and perform the required task. The outputs

of the pipeline located on the virtual machine are then linked back to the web-based platform

for access and further analysis from the user. A diagram explaining the information flow in the

research platform is presented in Figure 2.

Figure 2: Data and information flow in the research platform. The pipelines are stored in the

platform and are at the disposition of the user. The platform facilitates the application of

validated biomarker extraction workflows on large cohorts for high throughput analysis.

Page 9: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 9 -

The groups at UCL and USFD have been collaborating in order to integrate these imaging

pipelines into the research platform. Initially, a newly implemented pipeline, whether it is a

command-line script, a binary or a Nipype-based pipeline, needs to be installed as a dedicated

“Application” (see D7.3). This results in the creation of a dedicated resource in the platform.

For instance the UCL image-based biomarker’s extraction pipelines’ Application is found at

this URL1. The Application receives a corresponding xml end point to be used later on for

pipeline integration. Further details of the incorporation and execution of these pipelines is

found in the Appendix.

3. PER BIOMARKER PIPELINE IMPLEMENTATION

Below are the key biomarkers that have been tested and provided to the team of WP7 for

integration into the research platform. As the purpose of these biomarkers in previous

deliverables has been discussed, we only provide enough background here to make clear what

optimisation and testing has been done.

3.1. BIAS CORRECTION

Bias correction is one of the most common pre-processing steps performed on structural MR

images. It aims at correcting for low frequency signal variations induced by inhomogeneities

of the static magnetic field of the MRI scanner.

The algorithm used in this pipeline is an improved multi-level version of nonparametric non-

uniform intensity normalization N3 [2]. The algorithm has 4 major input parameters:

- Down sampling level (default: 1): As the bias field corresponds to low frequency signal

variation, it is often sufficient to work on a down sampled version of the input high-

resolution structural T1 image.

- Maximum iteration (default 50): The optimization occurs iteratively until the number

of iterations exceeds the maximum specified by this variable.

- Convergence (default: 0.001): The threshold used to determine convergence (the

standard deviation of the ratio between subsequent field estimates is used.

- FWHM (default: 0.15) The full width at half maximum of the Gaussian used to model

the bias field.

In addition to these inputs, bias correction often has improved performance when the algorithm

is provided with a rough segmentation of the intracranial volume, as including low intensity

voxels in the background causes instability if the algorithm. We achieve this segmentation

automatically using a quick registration between the input T1 and the MNI template, followed

by the resampling of the corresponding MNI mask to the input T1 space. The resulting mask is

used for optimising the bias correction within the embedded volume.

The outputs of the pipeline consist on the bias corrected image, the multiplicative bias field,

and (as an indication) the total intracranial mask used to estimate the bias field.

3.1.1. Evaluation on testing Set

The pipeline has been thoroughly tested against an in-house UCL database that consists of 180

subjects from a large-scale phase III trial in Alzheimer’s disease with data coming from multiple

MR scanners. The results from the bias correction itself is not important; rather, it’s the impact

of the bias correction on the Boundary Shit Integral, a measurement of atrophy between two

scans and often an exploratory endpoint in clinical trials. The results from the new bias

correction were compared to previously calculated values with the N3 algorithm that was used

to compute the BSI endpoint as it was submitted back to the sponsor. The correlation between

the resulting BSI measures is presented in Figure 3 as a Bland-Altman plot. These results show

1 https://dare.vph-share.eu/resources/7a78a3e9-f6a6-4f32-85cd-bea8f10a242d/

Page 10: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 10 -

that there is no systematic difference when using N4 w.r.t using N3. The multi-resolution aspect

of the N4 implementation explains the apparent variance of these differences.

Figure 3: Correlation of BSI measures when using N3 or N4 for bias correction as pre-

processing, presented as Bland-Altman plots.

The pipeline is multi-threaded; its computation time is correlated to the input image resolution.

The typical time is 10 minutes on a single CPU 2.5GHz Intel core for a 1.1 cubic mm image.

3.2. WHOLE BRAIN PARCELLATION AND TISSUE SEGMENTATION

The primary high throughput pipeline is the complete, automated parcellation of the cortical

and subcortical grey regions from a structural T1 image. In this implementation we use an

algorithm based on a label propagation scheme designated as Geodesic Information Flow (GIF)

[3]. This scheme necessitates the use of a template database where structural images and

corresponding segmented regions are used in a graph environment in order to transport or

propagate labels towards the target image. In this implementation we use a database carefully

constructed using the OASIS (65 subjects) and ADNI data (85 subjects). Unlike other template

libraries, this template contains data from subjects with a wide age range, some of whom are

affected by Alzheimer’s disease. Inclusion of similar subjects often improves the performance

of the parcellation. The template labels are based on the braincolor protocol

(http://braincolor.mindboggle.info/) The resulting segmented regions can be used for volume

based statistical analyses to help characterising biomarkers such as atrophy or other volume

related measures.

The pipeline input is the structural T1 image. First, the scan is linearly registered to the MNI

T1 atlas. The MNI TIV mask is subsequently resampled to the target image, and dilated (10

voxels). The dilated mask is used to crop the input image, and the resulting affine

transformation is composed to each dataset subject’s n order to initialise the non-linear

registrations. The resulting non-linear registrations are performed between each subject of the

database and the input image. To minimise the computation time, the input image is cropped to

reduce the volume to the minimum region of interest. These non-linear registrations constitute

the most computation expensive blocks of the pipeline (90%). For all registration the NiftyReg

(http://sourceforge.net/projects/niftyreg) has been used. Once all registrations are completed,

the final label propagation to the target T1 image is performed using the GIF algorithm [4] .

The components have been assembled using Nipype and the workflow has been integrated in

the platform as a black box. The workflow is described in Figure 4.

Page 11: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 11 -

Figure 4: Brain Parcellation workflow. A database is used to propagate template labels to the

target input structural image

The pipeline outputs consist of several images:

- The cropped and fully bias corrected image

- A tissue segmentation image: a 4D image where in each layer, the voxel signal is the

probability of belonging to respectively the CSF, the cortical grey matter, the white

matter, the deep grey matter, and the brainstem

- The brain parcellation labelled image

The key outputs that will be reused are the tissue segmentation and the parcellation image. The

pipeline has been tested on small databases and experts had evaluated the resulting

segmentations.

3.2.1. Testing Set

The whole brain parcellation pipeline was applied to a subset of 1000 T1 weighted scans from

the Rotterdam Scan Study [5]. Because the data from this study is not publicly available, the

pipeline had to be made compatible with the computing cluster of the Erasmus MC, the owner

of the study data. The compatibility was facilitated by the Nipype implementation of the

parcellation pipeline, and allowed us to compute brain region volumes of 1000 subjects in an

age range of 45 – 92 years. These brain region volumes will be compared to volumes computed

by other brain parcellation pipelines, such as FreeSurfer and in-house developed methods.

Figures below show results of a parcellation and the dynamic distributions of hippocampal and

Page 12: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 12 -

thalamus volume in the age range of 45 – 92 years. The volumes are given as a percentage of

intracranial volume.

3.3. HIPPOCAMPAL VOLUME PROFILE

There is evidence that different forms of neurodegenerative diseases have different patterns of

atrophy in the temporal lobe in the anterior-posterior gradient [6]. While the overall loss of

volume in specifically vulnerable regions like the hippocampus is informative, more localised

characterisation of the atrophy within the structure could be provide more information in terms

of differential diagnosis. A pipeline dedicated to the generation of this regional analysis from a

segmented hippocampus has been implemented. The components have been assembled using

Nipype and the workflow has been integrated in the platform as a black box. The input of the

pipeline is the segmentation in the form of a binary mask indicating which voxels in the image

are contained within the hippocampus. From this binary mask, a volume profile is created along

Figure 5: Example of whole brain parcellation on one of the 1000 subjects of the RSS.

Figure 6: Top: parcellation showing the hippocampus in red. Bottom: distribution of the

hippocampus volume as percentage of intracranial volume as function of age.

Page 13: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 13 -

the long axis of the hippocampi, running from the anterior head to the posterior tail. Using a

Gaussian kernel density estimator [7]. We obtain a continuous function local volume at any

given point of the hippocampal principal axis. The area below this curve therefore represents

the total hippocampus volume. Classification and statistical analysis along the principal axis

will be performed to determine which areas are the most sensitive in detecting differences

between the two disease groups.

Figure 7: hippocampus segmentations (left) with estimated left and right long axes (middle and

right) superimposed on structural T1 image.

The pipeline takes advantage of the global coordinate system in order to distinguish between

the left and the right hippocampi. From each structure the long axis of the group of voxels is

calculated using principal component analysis. The volume profile is then generated along a

normalised axis coordinate using a kernel density estimation technique (see Figure 8).

Figure 8: (left) Kernel density estimation in 1D (black line) with sparse data points (blue

markers). The blue dots represent the data points and the black continuous line represents the

density estimation. (right) Kernel density estimation is applied along the principal axis of the

hippocampus.

Page 14: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 14 -

Figure 9: Two examples of output profiles from a healthy subject (left) and a patient suffering

from Alzheimer’s disease (right). The graph shows the left (red) and right (green) hippocampal

volume profiles.

Figure 10: Volume Profile Generation workflow. Kernel density estimation is used to estimate

the continuous function from sparse voxel volumes projected along the hippocampus axis.

3.3.1. Evaluation on AD / control testing set

As an example of application, this pipeline has been applied to a small database of 60 subjects,

consisting of 23 controls and 40 patients suffering from Alzheimer’s disease (AD). The results

are shown in Figure 11. They demonstrate a difference in global volume between healthy

controls and AD patients. More interestingly, this difference is more pronounced in the anterior

part of the hippocampus.

Page 15: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 15 -

Figure 11: Hippocampal volume (in TIV percentage) profile on a population of 27 controls

(grey) and 43 AD patients (black) between the anterior and the posterior part.

3.4. DIFFUSION PROCESSING

Diffusion MRI allows for the depiction of white matter microstructure [8]. Changes in the white

matter due to neuronal dysfunction and other disease processes caused by neurodegenerative

dementias might be picked up by this imaging modality. However, this type of imaging is

inherently much lower resolution and noisier than conventional structural MRI. Image quality

problems are further exacerbated by its high sensitivity to physiological motion [9]. This

modality also results in thousands of images generated, and manually identifying issues with

quality in these scans is not feasible. We have provided the VPH-DARE@IT community with

an in-house pipeline dedicated to the pre-processing and processing of diffusion data.

Additionally, this pipeline allows for the automatic detection of major artefacts that can affect

the data. The workflow is described in Figure 12. The input data consists on:

- N Diffusion Weighted Images (DWIs)

- M B=0 non weighted images (B0)

- The T1 structural image

- The magnitude and phase field map images, which are used for correction of echo

planar imaging distortion artefacts (optional)

Due to the duration of scan acquisition (of the order of 5-10 minutes), diffusion MRI is prone

to subject motion. To detect such motion and partially correct for it, a common technique

consists of performing linear registration between the diffusion weighted images towards the

non-weighted one. In the case where M > 1, we first perform a groupwise (rigid) registration

of the B0 images. Due to the difference of signal nature between DWIs and B0s, it is useful to

perform the DWI to B0 registration in the log-space. This registration procedure corrects for

motion-induced and eddy-current affine distortions between DWIs. Another typical artefact is

Page 16: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 16 -

due to magnetic field inhomogeneities and induces low frequency susceptibility distortions. In

order to correct for this, we use a phase unwrapping technique [10] that calculates the

multiplicative distortion field from magnitude and phase field map images. In the case these

images are not present, a non-linear registration between the B0 and the T1 is performed. Since

the susceptibility distortion is of low frequency, we constrain the registration for smooth

deformations only.

The affine transformations and non-linear distortion fields are composed in order to obtain a

final displacement field for each DWI and B0 image. Each image is then interpolated using the

displacement fields. A (positive) constrained sinc interpolation is used. Note that in this

workflow each image is interpolated only once in order to avoid interpolation induced over-

smoothing. The resulting images and gradients are used to estimate a single tensor model at a

voxel level. For quality control purposes, two separate graphs are calculated. First, from the

DWI to B0 registration we extract the rotation parameters of the transformation in order to

quantify subject motion during scan. Second, normalised cross-correlation (NCC) is computed

between adjacent slices in all motion corrected images, as significant drops in NCC can help

identify signal dropout or motion artefacts that result in banding (zebra artefact) due to the

interleaved acquisition. We summarise the different processing steps on a diagram in Figure

12. The output of the workflow consist of the following:

- The corrected B0-DWI 4D image

- The corrected gradient table

- The estimated tensor map

- Tensor-based scalar maps (FA, MD, AD, RD, RGB)

- The individual DWI to B0 affine transformations

- The subject rotation graph

- The inter-slice cross-correlation graph

Some examples of diffusion processing pipeline outputs are shown in Figure 13, while quality

control graphs are shown in Figure 14.

Page 17: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 17 -

Figure 12: Workflow for diffusion weighted imaging data. The diffusion-weighted images are

linearly registered to the average B0 image. Field maps and T1 images are used to estimate

susceptibility distortion. The resulted corrected images are used for tensor fitting.

Figure 13: The diffusion-processing pipeline outputs maps depicting the white matter

arrangement. (Left) T1 weighted image. (Right) corresponding FA map colour-coded with

tissue orientation.

Page 18: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 18 -

Figure 14: Quality Control graphs for Diffusion MRI. (Top) DWI image showing some

significant signal dropouts. (Bottom) corresponding inter-slice cross-correlation for B0 (red)

and DWI (blue) images, where the problematic volume is automatically detected.

3.4.1. Evaluation on ADNI retrospective cohort

As an example, this workflow has been applied to a large portion of the ADNI retrospective

cohort. The purpose of this study is to demonstrate the capabilities of the pipeline to

automatically detect outliers in a large database. The data consists in 216 subjects, 67 healthy

controls, 91 AD patients and 58 patients of mild cognitive disorder. The diffusion pipeline was

applied and the resulting inter-slice normalised cross-correlation was extracted for each subject.

The results are presented in Figure 15. There are 13 subjects that are highlighted as outliers

with a simple threshold on the cross-correlation. This technique can be used in order to

automatically reject subjects from large cohorts within a statistical study, where the large

number of subjects and images does not allow exhaustive manual quality control. These results

were presented in the Y2 review as a clinical relevance exemplar of biomarker’s extraction

pipeline.

Page 19: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 19 -

Figure 15: Inter-slice cross-correlation graphs on 216 subjects of the ADNI cohort.

Thresholding allows the automatic detection of 13 outliers containing significant signal

dropouts.

4. BIOMARKER EXTRACTION ROADMAP

At the end of Y3, Milestone 33 is scheduled, which is the extraction of biomarkers from the

retrospective studies. This milestone will involve extraction of biomarkers from over 20,000

images, including the Rotterdam study. We will work with all the partners involved in Task 3.4

who will be contributing biomarkers to the milestone, and plan out the processes of finalising

the workflows, setting them up within the research platform, executing the pipeline on

appropriate data from each of the studies, followed by review of the results in order for

completion to be at the milestone date. Progress will be monitored by tracking updates of the

research platform within WP7. The progress will be reported back to all partners through

monthly teleconferences; and a simple spread sheet dashboard. During the teleconferences, we

will discuss any potential problems that could cause deviation from the plan. This exercise will

be supported by WP7, who will assist with the incorporation of workflows into the research

platform. WP5 and WP6 will use these biomarkers and incorporate them into the models

developed in these work packages.

Page 20: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 20 -

5. CONCLUSIONS

This deliverable provides clear guidelines on the implementation of the biomarker extraction

pipelines for the retrospective studies. These pipelines will result in a fully standardised set of

biomarkers in thousands of datasets from multiple retrospective cohorts, which will allow new

research questions to be explored, ones that normally cannot be addressed within any one of

the single disease cohorts.

6. REFERENCES

1. Clarkson, M.J., et al., The NifTK software platform for image-guided interventions:

platform overview and NiftyLink messaging. International journal of computer assisted

radiology and surgery, 2014: p. 1-16.

2. Tustison, N.J., et al., N4ITK: improved N3 bias correction. Medical Imaging, IEEE

Transactions on, 2010. 29(6): p. 1310-1320.

3. Cardoso, M.J., et al., Geodesic information flows, in Medical Image Computing and

Computer-Assisted Intervention–MICCAI 2012. 2012, Springer Berlin Heidelberg. p.

262-270.

4. Cardoso, M.J., et al., Geodesic Information Flows: Spatially-Variant Graphs and Their

Application to Segmentation and Fusion. 2015.

5. Ikram, M.A., et al., The Rotterdam Scan Study: design and update up to 2012. European

journal of epidemiology, 2011. 26(10): p. 811-824.

6. Chan, D., et al., Patterns of temporal lobe atrophy in semantic dementia and

Alzheimer's disease. Annals of neurology, 2001. 49(4): p. 433-442.

7. Turlach, B.A., Bandwidth selection in kernel density estimation: A review. 1993:

Université catholique de Louvain.

8. Westin, C.-F., et al., Processing and visualization for diffusion tensor MRI. Medical

image analysis, 2002. 6(2): p. 93-108.

9. Tournier, J.D., S. Mori, and A. Leemans, Diffusion tensor imaging and beyond. Magn

Reson Med, 2011. 65(6): p. 1532-56.

10. Daga, P., et al., Susceptibility artefact correction using dynamic graph cuts:

Application to neurosurgery. Medical image analysis, 2014. 18(7): p. 1132-1142.

Page 21: DELIVERABLE 3IT D3.3 2v1 Final.pdfDue date of deliverable Month 24 Actual submission date Month 27 Start date of project April 1st 2013 Project duration 48 months Work Package & Task

FP7-601055: VPH-DARE@IT D3.3 – Workflow implementation and streamlining for high-throughput image… 07/07/2015

- 21 -

7. ANNEXES

7.1. WORKFLOW

Here is a simplified step-by-step procedure for the integration and test of a new pipeline in the

platform:

1) Request credentials for the dare portal: https://dare.vph-share.eu.

2) Download and install Taverna 2.5.02, Java 73, the VPH-SHARE Taverna plugin4.

3) From Taverna:

a) Import the VPH-SHARE service using your application end-point.

b) Drag and drop the required service to the design panel (see Figure 16).

c) Provide inputs and outputs relative to the dare portal’s filestore architecture5.

d) Run the workflow from the Taverna menu.

4) From the dare portal:

a) Monitor the status of the pipeline from the workflow dashboard6.

b) Upload the tested workflow in the portal using this help page.

Figure 16: The Taverna Workbench. During the integration process, the user needs to import

the newly created service (top-left panel) into the workbench (main panel) and connect inputs

and outputs of the pipeline (in green) with the DARE portal nodes (in blue). The pipeline can

then be run from the menu.

2 http://www.taverna.org.uk/download/workbench/2-5/core/ 3 http://www.oracle.com/technetwork/java/javase/downloads/index.html 4 http://repository.cistib.org/nexus/content/repositories/releases/ 5 https://portal.vph-share.eu/filestore 6 https://dare.vph-share.eu/applications/#switchToWorkflowsView