open science resources for `big data' analyses of the human connectome

Open science resources for ‘Big Data’ analyses

of the human connectome

Cameron Craddock, PhDComputational Neuroimaging Lab

Center for Biomedical Imaging and Neuromodulation

Nathan S. Kline Institute for Psychiatric Research

Center for the Developing BrainChild Mind Institute

The Human Connectome• The sum total of all of the brain’s

connections– Structural connections: synapses and

fibers• Diffusion MRI

– Functional connections: synchronized physiological activity

• Resting state functional MRI

• Nodes are brain areas• Edges are connections

Craddock et al. Nature Methods, 2013.

Connectomics is Big Data

Discovery science of human brain function1. Characterizing inter-individual variation in connectomes (Kelly et al. 2012)

2. Identifying biomarkers of disease state, severity, and prognosis (Craddock 2009)

3. Re-defining mental health in terms of neurophenotypes, e.g. RDOC (Castellanos 2013)

Data is often shared only in its raw form – must be preprocessed to remove nuisance variation and to be made comparable across individuals and sites.

No consensus on preprocessing

This is particularly complicated for “post-hoc” aggregated datasets

A variety of analyses

The cost of discovery“Best practice” r-fMRI preprocessing: ~ 2 hours

Discovery dataset: ~1,000 subjects“Point and click” processing: 2,000 person hours (1 year)

Scripted processing: 2,000 CPU hours (84 days to minutes)

Different derivatives and analyses add timeDifferent preprocessing strategies scale time

Configurable Pipeline for the Analysis of Connectomes (CPAC)

• Pipeline to automate preprocessing and analysis of large-scale datasets

• Most cutting edge functional connectivity preprocessing and analysis algorithms

• Configurable to enable “plurality” – evaluate different processing parameters and strategies

• Automatically identifies and takes advantage of parallelism on multi-threaded, multi-core, and cluster architectures

• “Warm restarts” – only re-compute what has changed• Open science – open source• http://fcp-indi.github.io

Nypipe

• 33 datasets acquired with a variety of different test-retest designs– Intra- and inter-session re-tests– 1629 subjects– 3357 anatomical MRI scans– 5093 resting state fMRI scans– 1302 diffusion MRI scans

http://fcon_1000.projects.nitrc.org/indi/CoRR/html/index.html

Why not share preprocessed data?

• Make data available to a wider audience of researchers

• Evaluate reproducibility of analysis results

http://preprocessed-connectomes-project.github.io/

ADHD-200 Preprocessed• 374 ADHD & 598 TDC

– 7-21 years old• Two functional pipelines

– Athena: FSL & AFNI, precursor to C-PAC

– NIAK: MINC tools + NIAK using PSOM pipeline

• Structural pipeline– Burner: SPM5 based VBM

ADHD-200 Preprocessed (2)• 9,500 downloads from 49

different users• Athena preprocessed data used

by winning team of the ADHD Global competition

• 31 peer reviewed publications, 2 dissertations and 1 patent– (http://www.mendeley.com/

groups/4198361/adhd-200-preprocessed/)

Figure 2. Overview of the ADHD-200 Preprocessed audience.

Beijing DTI Preprocessed• 180 healthy college

students• 55 with Verbal,

Performance, and Full IQ• Preprocessed using FSL

– DTI scalars (FA, MD, etc…)– Probabilistic Tractography

ABIDE Preprocessed indexed by NDAR• 539 ASD and 573 typical

– 6 – 64 years old– Some overlap with controls from

ADHD-200• 4 Functional Preprocessing

Pipelines• 4 Preprocessing strategies

– GSR, No-GSR, Filtering, No-Filtering• 4 Cortical thickness pipelines

– ANTS, CIVET, Freesurfer, Mindboggle

ABIDE Preprocessed (2)

DPARSF

CCS CBRAIN

Team Tools AnalysesC-BRAIN CIVET, MINC Cortical MeasuresC-PAC AFNI, ANTs, FSL, Nipype R-fMRI, VBM, Cortical MeasuresCCS AFNI, Freesurfer, FSL R-fMRI, VBM, Cortical MeasuresDPARSF DPARSF, REST, SPM R-fMRIMindboggle Mindboggle Cortical MeasuresNIAK II MINC, NIAK, PSOM R-fMRI

Quality Assessment Protocol• Spatial Measures

– Contrast to Noise Ratio– Entropy Focus Criterion– Foreground to Background Energy Ratio– Smoothness (FWHM)– % Artifact Voxels– Signal-to-Noise Ratio

• Temporal Measures– Standardized DVARS– Median distance index– Mean Functional Displacement– # Voxels with FD > 0.2m– % Voxels with FD > 0.2m

http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/

Quality Assessment Protocol (2)• Implemented in python• Normative datasets to

help learn thresholds for quality control– ABIDE– CoRR

http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/

Regional Brainhacks• One event that linked 8 Cities, 3

Countries, 2 continents– Ann Arbor– Boston– Miami– Montreal– New York City– Porto Alegre, Brazil– Toronto– Washington DC

AcknowledgementsCPAC Team: Daniel Clark, Steven Giavasis and Michael Milham.

Quality Assessment Protocol: Zarrar Shehzad, Daniel Lurie, Steven Giavasis, and Sang Han Lee.

ABIDE Preprocessed: Pierre Bellec, Yassine Benhajali, Francois Chouinard, Daniel Clark, R. Cameron Craddock, Alan Evans, Steven Giavasis, Budhachandra Khundrakpam, John Lewis, Qingyang Li, Zarrar Shezhad, Aimi Watanabe, Ting Xu, Chao-Gan Yan, Zhen Yang, Xinian Zuo, and the ABIDE consortium.

Brainhack Organizers: Pierre Bellec, Daniel Margulies, Maarten Mennes, Donald McLauren, Satra Ghosh, Matt Hutchison, Robert Welsh, Scott Peltier, Jonathan Downer, Stephen Strother, Katie Dunlop, Angie Laird, Lucina Uddin, Benjamin De Leener, Julien Cohen-Adad, Andrew Gerber, Alex Franco, Caroline Froehlich, Felipe Meneguzzi, John VanMeter, Lei Liew, Ziad Saad, Prantik Kundu

CPAC-NDAR integration was funded by a contract from NDAR.ABIDE Preprocessed data is hosted in a Public S3 Bucket provided by AWS.

open science resources for `big data' analyses of the human connectome

Science