big neuro data stat - amazon simple storage service (s3 ... · pdf filefunds xdata (darpa),...

27
BIG (NEURO) STATISTICS by, Joshua T. Vogelstein Duke University, Dept. Stats JHU, IDIES (cosmology) Kavli Salon, Here & Now

Upload: phamduong

Post on 18-Feb-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

BIG (NEURO)

STATISTICS

by, Joshua T. Vogelstein Duke University, Dept. Stats

JHU, IDIES (cosmology) Kavli Salon, Here & Now

Outline

• Background & Motivation

• Computational Challenges

• Statistical Challenges

• Summary & Discussion

Neuroscientific Aims• Lay: “understand” the brain and its relationship to the mind

• Formal: let X = brain, and Y = mind:

• construct models: {P𝛉[X,Y] : 𝛉 ϵ 𝚹}

• estimate parameters: 𝛉* = argmax P𝛉[X,Y]

• make predictions: y = argmax P𝛉[X=x | Y=y’]P[Y=y’]

• test theories: P𝛉[X=x | Y=0] ≠ P𝛉[X=x | Y=1]

Motivations

• Y = awesome computational power

• Y = personality type

• Y = psychiatric disorder

(reminder: X = brain, and Y = mind)

other examples: Higgs boson, galaxy, mutant, etc.

Big Data Challenges

Computational Statistical

Memory/Storage High-Dimensions

Writes & Indexing Outliers

Scalable algorithms Non-Euclidean

Outline

• Background & Motivation

• Computational Challenges

• Statistical Challenges

• Summary & Discussion

Computational Challenge 1Data doesn’t fit in memory/disk

Computational Challenge 2Massive writes & spatial indexing

Computational Challenge 3Multidimensional

Computational Challenge 3Multidimensional

Computational Challenge 3Multidimensional

Computational Desiderata

Scalable Computer Vision

Multidimensional Database

4D alignment Seamless integration

Color correction Spatial (locality) indexing

Scene segmentation Massive writes

Take Home Messagesstuff we lack

Computational Statistical

Scalable Computer Vision

Multidimensional Spatial Databases

Questions

???

Outline

• Background & Motivation

• Computational Challenges

• Statistical Challenges

• Summary & Discussion

Motivating (Descriptive) Challenge

• Growth charts are useful prognostic tools

• We want “growth” chart for the brain (using ‘objective’ measurements)

• This requires estimating descriptive statistics, such as the mean

Motivating (Predictive) Challenge

• Biomarkers for clinical disorders are useful

• We want biomarkers for various psychiatric disorders

• This requires estimating predictive statistics, such as a classifier

Statistical Taxonomy

Descriptive Predictive # samples

Little x x d ≪ n

Big x x D ≫ n

for i ϵ {1,2,…,n}

Example ChallengesDescriptive Predictive

class exemplars class exemplars

location mean, median

two-class classification

LDA, QDA

scale variance, MAD

n-class classification

kNN, CART

matrix factorizations

eigenvectors NMF regression CART,

SVRdensity

estimation KDE multivariate regression RRR

multimodal fusion CCA manifold

matching JoFC

Challenge 1: High-Dimensions• simplest descriptive example possible: estimate mean

• little solution: x* = argmin Σi(xi - x)2

• big challenge 1: x*is inadmissible when d > 2

Challenge 1: High-Dimensions• simplest descriptive example possible: 2-class classification

• little solution: little solution (LDA): y* = argmaxy N(x; my, S) py

• big challenge: S is singular when D ≫ n

Challenge 2: Outliers• simplest predictive example: estimate mean with outliers

• little solution (median): x* = argmin Σi abs(xi - x)

• big challenge: median is not well-defined when d > 1

Challenge 2: Outliers• simplest predictive example: classify with outliers

• little solution: SVM

• big challenge: support vectors are even more singular

Challenge 3: Non-Euclidean

• simplest descriptive example: estimate mean when x ∉ Rd

• little solution: x* = (x1 + x2 + … + xn)/n

• big challenge: no obvious choice for ‘+’, e.g., what is:

+ = ?A + B = C

Take Home Messagesstuff we lack

Computational Statistical

Scalable Computer Vision

‘Default’ Descriptive Theory & Methods

Multidimensional Spatial Databases

‘Default’ Predictive Theory & Methods

AcknowledgementsTheory Carey E. Priebe, Mauro Maggioni, David Dunson

Code Randal Burns,

Data M Milham, K Deisseroth, J Lichtman, C Reid, S Smith

Funds XDATA (DARPA), BIGDATA & CRCNS (NIH/NSF)

Love yummy, family, friends, earth, universe, multiverse?

e: [email protected], c:443.858.9911 w: jovo.me, openconnecto.me

Quotes

• “All models are wrong, some are useful.” (George Box)

• “Two cultures.” (JP Snow & Leo Breiman)

• “Statistics is the art of data collection, analysis & interpretation.” (Huber)