open science resources for `big data' analyses of the human connectome
TRANSCRIPT
Open science resources for ‘Big Data’ analyses
of the human connectome
Cameron Craddock, PhDComputational Neuroimaging Lab
Center for Biomedical Imaging and Neuromodulation
Nathan S. Kline Institute for Psychiatric Research
Center for the Developing BrainChild Mind Institute
The Human Connectome• The sum total of all of the brain’s
connections– Structural connections: synapses and
fibers• Diffusion MRI
– Functional connections: synchronized physiological activity
• Resting state functional MRI
• Nodes are brain areas• Edges are connections
Craddock et al. Nature Methods, 2013.
Connectomics is Big Data
Discovery science of human brain function1. Characterizing inter-individual variation in connectomes (Kelly et al. 2012)
2. Identifying biomarkers of disease state, severity, and prognosis (Craddock 2009)
3. Re-defining mental health in terms of neurophenotypes, e.g. RDOC (Castellanos 2013)
Data is often shared only in its raw form – must be preprocessed to remove nuisance variation and to be made comparable across individuals and sites.
No consensus on preprocessing
This is particularly complicated for “post-hoc” aggregated datasets
A variety of analyses
The cost of discovery“Best practice” r-fMRI preprocessing: ~ 2 hours
Discovery dataset: ~1,000 subjects“Point and click” processing: 2,000 person hours (1 year)
Scripted processing: 2,000 CPU hours (84 days to minutes)
Different derivatives and analyses add timeDifferent preprocessing strategies scale time
Configurable Pipeline for the Analysis of Connectomes (CPAC)
• Pipeline to automate preprocessing and analysis of large-scale datasets
• Most cutting edge functional connectivity preprocessing and analysis algorithms
• Configurable to enable “plurality” – evaluate different processing parameters and strategies
• Automatically identifies and takes advantage of parallelism on multi-threaded, multi-core, and cluster architectures
• “Warm restarts” – only re-compute what has changed• Open science – open source• http://fcp-indi.github.io
Nypipe
• 33 datasets acquired with a variety of different test-retest designs– Intra- and inter-session re-tests– 1629 subjects– 3357 anatomical MRI scans– 5093 resting state fMRI scans– 1302 diffusion MRI scans
http://fcon_1000.projects.nitrc.org/indi/CoRR/html/index.html
Why not share preprocessed data?
• Make data available to a wider audience of researchers
• Evaluate reproducibility of analysis results
http://preprocessed-connectomes-project.github.io/
ADHD-200 Preprocessed• 374 ADHD & 598 TDC
– 7-21 years old• Two functional pipelines
– Athena: FSL & AFNI, precursor to C-PAC
– NIAK: MINC tools + NIAK using PSOM pipeline
• Structural pipeline– Burner: SPM5 based VBM
ADHD-200 Preprocessed (2)• 9,500 downloads from 49
different users• Athena preprocessed data used
by winning team of the ADHD Global competition
• 31 peer reviewed publications, 2 dissertations and 1 patent– (http://www.mendeley.com/
groups/4198361/adhd-200-preprocessed/)
Figure 2. Overview of the ADHD-200 Preprocessed audience.
Beijing DTI Preprocessed• 180 healthy college
students• 55 with Verbal,
Performance, and Full IQ• Preprocessed using FSL
– DTI scalars (FA, MD, etc…)– Probabilistic Tractography
ABIDE Preprocessed indexed by NDAR• 539 ASD and 573 typical
– 6 – 64 years old– Some overlap with controls from
ADHD-200• 4 Functional Preprocessing
Pipelines• 4 Preprocessing strategies
– GSR, No-GSR, Filtering, No-Filtering• 4 Cortical thickness pipelines
– ANTS, CIVET, Freesurfer, Mindboggle
ABIDE Preprocessed (2)
DPARSF
CCS CBRAIN
Team Tools AnalysesC-BRAIN CIVET, MINC Cortical MeasuresC-PAC AFNI, ANTs, FSL, Nipype R-fMRI, VBM, Cortical MeasuresCCS AFNI, Freesurfer, FSL R-fMRI, VBM, Cortical MeasuresDPARSF DPARSF, REST, SPM R-fMRIMindboggle Mindboggle Cortical MeasuresNIAK II MINC, NIAK, PSOM R-fMRI
Quality Assessment Protocol• Spatial Measures
– Contrast to Noise Ratio– Entropy Focus Criterion– Foreground to Background Energy Ratio– Smoothness (FWHM)– % Artifact Voxels– Signal-to-Noise Ratio
• Temporal Measures– Standardized DVARS– Median distance index– Mean Functional Displacement– # Voxels with FD > 0.2m– % Voxels with FD > 0.2m
http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/
Quality Assessment Protocol (2)• Implemented in python• Normative datasets to
help learn thresholds for quality control– ABIDE– CoRR
http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/
Regional Brainhacks• One event that linked 8 Cities, 3
Countries, 2 continents– Ann Arbor– Boston– Miami– Montreal– New York City– Porto Alegre, Brazil– Toronto– Washington DC
AcknowledgementsCPAC Team: Daniel Clark, Steven Giavasis and Michael Milham.
Quality Assessment Protocol: Zarrar Shehzad, Daniel Lurie, Steven Giavasis, and Sang Han Lee.
ABIDE Preprocessed: Pierre Bellec, Yassine Benhajali, Francois Chouinard, Daniel Clark, R. Cameron Craddock, Alan Evans, Steven Giavasis, Budhachandra Khundrakpam, John Lewis, Qingyang Li, Zarrar Shezhad, Aimi Watanabe, Ting Xu, Chao-Gan Yan, Zhen Yang, Xinian Zuo, and the ABIDE consortium.
Brainhack Organizers: Pierre Bellec, Daniel Margulies, Maarten Mennes, Donald McLauren, Satra Ghosh, Matt Hutchison, Robert Welsh, Scott Peltier, Jonathan Downer, Stephen Strother, Katie Dunlop, Angie Laird, Lucina Uddin, Benjamin De Leener, Julien Cohen-Adad, Andrew Gerber, Alex Franco, Caroline Froehlich, Felipe Meneguzzi, John VanMeter, Lei Liew, Ziad Saad, Prantik Kundu
CPAC-NDAR integration was funded by a contract from NDAR.ABIDE Preprocessed data is hosted in a Public S3 Bucket provided by AWS.