open science resources for `big data' analyses of the human connectome

20
Open science resources for ‘Big Data’ analyses of the human connectome Cameron Craddock, PhD Computational Neuroimaging Lab Center for Biomedical Imaging and Neuromodulation Nathan S. Kline Institute for Psychiatric Research Center for the Developing Brain Child Mind Institute

Upload: cameron-craddock

Post on 09-Jan-2017

337 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Open science resources for `Big Data' Analyses of the human connectome

Open science resources for ‘Big Data’ analyses

of the human connectome

Cameron Craddock, PhDComputational Neuroimaging Lab

Center for Biomedical Imaging and Neuromodulation

Nathan S. Kline Institute for Psychiatric Research

Center for the Developing BrainChild Mind Institute

Page 2: Open science resources for `Big Data' Analyses of the human connectome

The Human Connectome• The sum total of all of the brain’s

connections– Structural connections: synapses and

fibers• Diffusion MRI

– Functional connections: synchronized physiological activity

• Resting state functional MRI

• Nodes are brain areas• Edges are connections

Craddock et al. Nature Methods, 2013.

Page 3: Open science resources for `Big Data' Analyses of the human connectome

Connectomics is Big Data

Page 4: Open science resources for `Big Data' Analyses of the human connectome

Discovery science of human brain function1. Characterizing inter-individual variation in connectomes (Kelly et al. 2012)

2. Identifying biomarkers of disease state, severity, and prognosis (Craddock 2009)

3. Re-defining mental health in terms of neurophenotypes, e.g. RDOC (Castellanos 2013)

Data is often shared only in its raw form – must be preprocessed to remove nuisance variation and to be made comparable across individuals and sites.

Page 5: Open science resources for `Big Data' Analyses of the human connectome

No consensus on preprocessing

This is particularly complicated for “post-hoc” aggregated datasets

Page 6: Open science resources for `Big Data' Analyses of the human connectome

A variety of analyses

Page 7: Open science resources for `Big Data' Analyses of the human connectome

The cost of discovery“Best practice” r-fMRI preprocessing: ~ 2 hours

Discovery dataset: ~1,000 subjects“Point and click” processing: 2,000 person hours (1 year)

Scripted processing: 2,000 CPU hours (84 days to minutes)

Different derivatives and analyses add timeDifferent preprocessing strategies scale time

Page 8: Open science resources for `Big Data' Analyses of the human connectome

Configurable Pipeline for the Analysis of Connectomes (CPAC)

• Pipeline to automate preprocessing and analysis of large-scale datasets

• Most cutting edge functional connectivity preprocessing and analysis algorithms

• Configurable to enable “plurality” – evaluate different processing parameters and strategies

• Automatically identifies and takes advantage of parallelism on multi-threaded, multi-core, and cluster architectures

• “Warm restarts” – only re-compute what has changed• Open science – open source• http://fcp-indi.github.io

Nypipe

Page 9: Open science resources for `Big Data' Analyses of the human connectome

• 33 datasets acquired with a variety of different test-retest designs– Intra- and inter-session re-tests– 1629 subjects– 3357 anatomical MRI scans– 5093 resting state fMRI scans– 1302 diffusion MRI scans

http://fcon_1000.projects.nitrc.org/indi/CoRR/html/index.html

Page 10: Open science resources for `Big Data' Analyses of the human connectome

Why not share preprocessed data?

• Make data available to a wider audience of researchers

• Evaluate reproducibility of analysis results

http://preprocessed-connectomes-project.github.io/

Page 11: Open science resources for `Big Data' Analyses of the human connectome

ADHD-200 Preprocessed• 374 ADHD & 598 TDC

– 7-21 years old• Two functional pipelines

– Athena: FSL & AFNI, precursor to C-PAC

– NIAK: MINC tools + NIAK using PSOM pipeline

• Structural pipeline– Burner: SPM5 based VBM

Page 12: Open science resources for `Big Data' Analyses of the human connectome

ADHD-200 Preprocessed (2)• 9,500 downloads from 49

different users• Athena preprocessed data used

by winning team of the ADHD Global competition

• 31 peer reviewed publications, 2 dissertations and 1 patent– (http://www.mendeley.com/

groups/4198361/adhd-200-preprocessed/)

Figure 2. Overview of the ADHD-200 Preprocessed audience.

Page 13: Open science resources for `Big Data' Analyses of the human connectome

Beijing DTI Preprocessed• 180 healthy college

students• 55 with Verbal,

Performance, and Full IQ• Preprocessed using FSL

– DTI scalars (FA, MD, etc…)– Probabilistic Tractography

Page 14: Open science resources for `Big Data' Analyses of the human connectome

ABIDE Preprocessed indexed by NDAR• 539 ASD and 573 typical

– 6 – 64 years old– Some overlap with controls from

ADHD-200• 4 Functional Preprocessing

Pipelines• 4 Preprocessing strategies

– GSR, No-GSR, Filtering, No-Filtering• 4 Cortical thickness pipelines

– ANTS, CIVET, Freesurfer, Mindboggle

Page 15: Open science resources for `Big Data' Analyses of the human connectome

ABIDE Preprocessed (2)

DPARSF

CCS CBRAIN

Team Tools AnalysesC-BRAIN CIVET, MINC Cortical MeasuresC-PAC AFNI, ANTs, FSL, Nipype R-fMRI, VBM, Cortical MeasuresCCS AFNI, Freesurfer, FSL R-fMRI, VBM, Cortical MeasuresDPARSF DPARSF, REST, SPM R-fMRIMindboggle Mindboggle Cortical MeasuresNIAK II MINC, NIAK, PSOM R-fMRI

Page 16: Open science resources for `Big Data' Analyses of the human connectome

Quality Assessment Protocol• Spatial Measures

– Contrast to Noise Ratio– Entropy Focus Criterion– Foreground to Background Energy Ratio– Smoothness (FWHM)– % Artifact Voxels– Signal-to-Noise Ratio

• Temporal Measures– Standardized DVARS– Median distance index– Mean Functional Displacement– # Voxels with FD > 0.2m– % Voxels with FD > 0.2m

http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/

Page 17: Open science resources for `Big Data' Analyses of the human connectome

Quality Assessment Protocol (2)• Implemented in python• Normative datasets to

help learn thresholds for quality control– ABIDE– CoRR

http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/

Page 18: Open science resources for `Big Data' Analyses of the human connectome
Page 19: Open science resources for `Big Data' Analyses of the human connectome

Regional Brainhacks• One event that linked 8 Cities, 3

Countries, 2 continents– Ann Arbor– Boston– Miami– Montreal– New York City– Porto Alegre, Brazil– Toronto– Washington DC

Page 20: Open science resources for `Big Data' Analyses of the human connectome

AcknowledgementsCPAC Team: Daniel Clark, Steven Giavasis and Michael Milham.

Quality Assessment Protocol: Zarrar Shehzad, Daniel Lurie, Steven Giavasis, and Sang Han Lee.

ABIDE Preprocessed: Pierre Bellec, Yassine Benhajali, Francois Chouinard, Daniel Clark, R. Cameron Craddock, Alan Evans, Steven Giavasis, Budhachandra Khundrakpam, John Lewis, Qingyang Li, Zarrar Shezhad, Aimi Watanabe, Ting Xu, Chao-Gan Yan, Zhen Yang, Xinian Zuo, and the ABIDE consortium.

Brainhack Organizers: Pierre Bellec, Daniel Margulies, Maarten Mennes, Donald McLauren, Satra Ghosh, Matt Hutchison, Robert Welsh, Scott Peltier, Jonathan Downer, Stephen Strother, Katie Dunlop, Angie Laird, Lucina Uddin, Benjamin De Leener, Julien Cohen-Adad, Andrew Gerber, Alex Franco, Caroline Froehlich, Felipe Meneguzzi, John VanMeter, Lei Liew, Ziad Saad, Prantik Kundu

CPAC-NDAR integration was funded by a contract from NDAR.ABIDE Preprocessed data is hosted in a Public S3 Bucket provided by AWS.