![Page 1: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/1.jpg)
www.ci.anl.govwww.ci.uchicago.edu
DTI Image Processing Pipeline and Cloud Computing EnvironmentKyle ChardComputation InstituteUniversity of Chicago and Argonne National Laboratory
![Page 2: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/2.jpg)
www.ci.anl.govwww.ci.uchicago.edu
2 DTI Pipelines and Cloud Infrastructure
Introduction
• DTI image analysis requires the use of many tools– QC, Registration, ROI Marking, Fiber Tracking, ..
• Constructing analyses is challenging– Data & tool discovery, selection, orchestration, ..
• We have made huge strides in terms of data– Data formats, repositories, protocols, metadata, CDEs
• We now need infrastructure to reduce the barriers that exist between data providers, tool developers, researchers, and clinicians– Big Science. Small Labs
o We have exceptional infrastructure for the 1%, what about the 99%?
![Page 3: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/3.jpg)
www.ci.anl.govwww.ci.uchicago.edu
3 DTI Pipelines and Cloud Infrastructure
Common Approach to Analysis
(Re)Run Script
Install
Modify
Camino
![Page 4: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/4.jpg)
www.ci.anl.govwww.ci.uchicago.edu
4 DTI Pipelines and Cloud Infrastructure
How can we improve?
• We need a platform where users can easily construct and execute analyses– Using best of bread tools and pipelines – Abstracting low level infrastructure and platform
heterogeneity– Supporting automation and parallelism– Supporting experimentation
=> Make existing tools and common analyses mundane building blocks
![Page 5: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/5.jpg)
www.ci.anl.govwww.ci.uchicago.edu
5 DTI Pipelines and Cloud Infrastructure
DTI Metric Reproducibility Pipeline
• Ultimate Goal: Investigate the feasibility of using DTI in clinical practice
• Automatic calculation of DTI metrics (FA, MD) from 48 automatically generated ROIs– Using existing tools to create a reusable analysis
workflow that can be easily repeated – Investigate the ability to scale analyses over large
datasets• Explore the reproducibility over a group of 20
subjects with 4 scans spread over 2 sessions
![Page 6: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/6.jpg)
www.ci.anl.govwww.ci.uchicago.edu
6 DTI Pipelines and Cloud Infrastructure
DTI Processing Pipeline (1)
1. ECC DTI (FSL)
2. BET DTI (FSL)
4. Linear Registration DTI / T1 (FSL FLIRT)
5. DTI Fitting (FSL/Camino)7. Non-linear Registration T1/Template (FSL FNIRT)
9. Transform FA/MD to MNI space (FSL Applywarp)
8. Calculate ROI Mean FA/MD (AFNI 3dmaskave)
3. BET T1 (FSL)
DTI T1BVEC & BVAL
Template
AtlasMask
7. Linear Registration T1/Template (FSL FLIRT)
![Page 7: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/7.jpg)
www.ci.anl.govwww.ci.uchicago.edu
7 DTI Pipelines and Cloud Infrastructure
DTI Processing Pipeline (2)
1. ECC DTI (FSL)
2. BET DTI (FSL)
3. DTI Fitting (FSL/Camino)
6. Calculate ROI Mean(3dmaskave)
DTI BVEC & BVAL
Atlas Mask
FA image MD image
Linear Registration(FSL FLIRT)
Non- Linear Registration(FSL FNIRT)
FA Template
FA in MNI space MD in MNI space
Apply Warp coefficient
![Page 8: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/8.jpg)
www.ci.anl.govwww.ci.uchicago.edu
8 DTI Pipelines and Cloud Infrastructure
Globus Genomics• SaaS for genomics• Graphical interface for
creation and execution• Supports ondemand
provisioning based on pricing policies
• Tools installed dynamically when required
XNAT Pipeline Engine• Defined by code (XML
+ scripts)• Overhead to include
tools, develop interfaces and create pipelines
• Difficult to change tools/pipelines
• Some support for parallelization
Scripts• Bash scripts written to
execute tools on a single computer
• Time consuming, error prone, hard to transfer knowledge
• Little support for parallelization
Approaches for Implementing Pipelines
![Page 9: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/9.jpg)
www.ci.anl.govwww.ci.uchicago.edu
9 DTI Pipelines and Cloud Infrastructure
DTI Pipeline Platform
GlobusTransfer
Galaxy
Condor
Shared File System
DynamicScheduler
Galaxy & Manager Dynamic Worker Pool
…
GlobusEndpoints
![Page 10: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/10.jpg)
www.ci.anl.govwww.ci.uchicago.edu
10 DTI Pipelines and Cloud Infrastructure
DTI Pipelines in the Cloud
GlusterGridFTPCondor
NFS
Schedule
Camino
![Page 11: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/11.jpg)
www.ci.anl.govwww.ci.uchicago.edu
11 DTI Pipelines and Cloud Infrastructure
DTI Pipelines in Galaxy
![Page 12: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/12.jpg)
www.ci.anl.govwww.ci.uchicago.edu
12 DTI Pipelines and Cloud Infrastructure
Cloud Computing
• Leverages economies of scale to facilitate utility models• Pay only for resources used• 1 * 100 hours == 100 * 1 hour
• On-demand and elastic access to “unlimited” capacity• Addresses fluctuating requirements
• Web access to data through defined interfaces
• Platform as a Service– No management of hardware or
low level tools
Infrastructure as a Service
Platform as a Service
Software as a Service
![Page 13: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/13.jpg)
www.ci.anl.govwww.ci.uchicago.edu
13 DTI Pipelines and Cloud Infrastructure
Challenges Moving to the Cloud
• Resource Selection: Comparing price, capabilities, performance, instance types (EBS, Instance store), tool performance
• Tool Selection and Management: Finding tools, installing, configuring and using them in different environments
• Analysis/Resource Management: Developing structured and repeatable analyses with different tools.
• Data transfer: Moving large amounts of data in/out of Cloud environment reliably and efficiently
• Scale and Parallelism: Scaling analyses by efficiently parallelizing across elastic infrastructure
• Security: Data and computation security - HIPAA?
![Page 14: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/14.jpg)
www.ci.anl.govwww.ci.uchicago.edu
14 DTI Pipelines and Cloud Infrastructure
Amazon EC2 Pricing
System Specifications Pricing
CPU Units CPU Cores Memory On-Demand Spot (Low) Spot (High)
m1.large 4 2 7.5 0.24 0.026 5.5
m1.xlarge 8 4 15 0.48 0.052 0.64
m3.xlarge 13 4 15 0.5 0.058 0.058
m3.2xlarge 26 8 30 1 0.0115 0.115
m2.xlarge 6.5 2 17.1 0.41 0.035 0.36
m2.2xlarge 13 4 34.2 0.82 0.07 3
m2.4xlarge 26 8 68.4 1.64 0.14 0.14
![Page 15: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/15.jpg)
www.ci.anl.govwww.ci.uchicago.edu
15 DTI Pipelines and Cloud Infrastructure
Spot Pricing Volatility
![Page 16: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/16.jpg)
www.ci.anl.govwww.ci.uchicago.edu
16 DTI Pipelines and Cloud Infrastructure
Instance Performance and Pricing
m1.large
m1.xlarge
m3.xlarge
m3.2xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
0
20
40
60
80
100
120
0
0.3
0.6
0.9
1.2
1.5
EBS Instance Store On-DemandSpot (Low) Spot (High)
Tim
e (M
inut
es)
Cost
per
Sub
jec
($)
![Page 17: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/17.jpg)
www.ci.anl.govwww.ci.uchicago.edu
17 DTI Pipelines and Cloud Infrastructure
Pricing - Multiple Analyses Per Node
m1.large
m1.xlarge
m3.xlarge
m3.2xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
00.05
0.10.15
0.20.25
0.30.35
0.40.45
0.5On-Demand Spot (Low) Spot (High)
Cost
per
Sub
ject
($)
![Page 18: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/18.jpg)
www.ci.anl.govwww.ci.uchicago.edu
18
Elastic Startup Cost
DTI Pipelines and Cloud Infrastructure
New Worker Existing Worker
0:00:00
0:15:00
0:30:00
0:45:00
1:00:00
1:15:00
ROI Calculation
Tensor Fitting
ECC & Registration
Contextualize
Spot Price
Queue
Tim
e
![Page 19: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/19.jpg)
www.ci.anl.govwww.ci.uchicago.edu
19 DTI Pipelines and Cloud Infrastructure
Data Transfer with Globus Online
• Reliable file transfer, sharing, syncing.– Easy “fire and forget” file transfers– Automatic fault recovery– High performance– Across multiple security domains
• In place sharing of files with users and groups
• No IT required.– Software as a Service (SaaS)
o No client software installationo New features automatically
available
![Page 20: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/20.jpg)
www.ci.anl.govwww.ci.uchicago.edu
20 DTI Pipelines and Cloud Infrastructure
Transfer Comparison
![Page 21: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/21.jpg)
www.ci.anl.govwww.ci.uchicago.edu
21 DTI Pipelines and Cloud Infrastructure
Summary
• Structured pipelines simplify creation, execution and sharing of complex analyses– Hosted as a service can further reduce barriers
• By outsourcing pipeline execution on the Cloud we can reduce overhead and costs– Previously we took weeks to process ~100 scans
o Using this approach < 5 cents a subject ($5 for 1 hour)
• What's next?– Can we deliver this as a service?
o Billing, security, paradigm shift, interactive tools …– Developing toolsheds for sharing tools and pipelines
![Page 22: Www.ci.anl.gov DTI Image Processing Pipeline and Cloud Computing Environment Kyle Chard Computation Institute University of Chicago](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c7b5503460f9492e8d1/html5/thumbnails/22.jpg)
www.ci.anl.govwww.ci.uchicago.edu
22 DTI Pipelines and Cloud Infrastructure
Acknowledgements
• Mike Vannier, Xia Jiang, Farid Dahi• Globus Online
– Ian Foster, Steve Tuecke, Rachana Ananthakrishnan• Globus Genomics
o Ravi Madduri, Paul Dave, Dina Sulakhe, Lukasz Lacinski