examples of high-dimensional data - rice universitygallen/examples_high_dim_data.pdf · examples of...

18
Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional Data January 10, 2011 (Stat 699) High-Dimensional Data January 10, 2011 1 / 16

Upload: lymien

Post on 13-Jul-2018

281 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Examples of High-Dimensional Data

Genevera I. Allen

Statistical Learning: High-Dimensional Data

January 10, 2011

(Stat 699) High-Dimensional Data January 10, 2011 1 / 16

Page 2: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

1 Large p, Small n: Biological Data

2 Random Fields Data

3 Collaborative Filtering Data

(Stat 699) High-Dimensional Data January 10, 2011 2 / 16

Page 3: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Example: Microarrays

Measure gene expression.

Often tens of thousands of genes(features).

Only tens of hundreds of samples.

arrays

genes

(Stat 699) High-Dimensional Data January 10, 2011 3 / 16

Page 4: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Review: Microarrays

(Stears, R. L. et. al, 2003)

(Stat 699) High-Dimensional Data January 10, 2011 4 / 16

Page 5: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Statistical Questions: Microarrays

Data pre-processing:I Normalization.I Missing data imputation.

Inference:I Which genes are significant?

Clustering:I Groups of genes, groups of

samples.

Model Building:I Small n, large p.

arrays

genes

(Stat 699) High-Dimensional Data January 10, 2011 5 / 16

Page 6: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Other Types of Biological Data

Genetics:

Deep Sequencing - Counts.

Micro RNA Expression - Continuous.

CGH (Copy Number Variation) - Continuous / Categorical.

SNPs (Single Nucleotide Polymorphisms) - Binary / Categorical.

Methalaytion - Continuous.

(Stat 699) High-Dimensional Data January 10, 2011 6 / 16

Page 7: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Other Types of Biological Data

Proteomics / Metabolomics (Chemometrics):

(H-NMR) Measures the chemical shift associated with variousmetabolites.

(Stat 699) High-Dimensional Data January 10, 2011 6 / 16

Page 8: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

1 Large p, Small n: Biological Data

2 Random Fields Data

3 Collaborative Filtering Data

(Stat 699) High-Dimensional Data January 10, 2011 7 / 16

Page 9: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Example: Functional MRIs (fMRI)

Rows: Voxels.

Columns: Subjects (And/or replicates and times).

Measurement: Hemodynamic response (change in blood flow).

Slice 15 Slice 16 Slice 17

(Stat 699) High-Dimensional Data January 10, 2011 8 / 16

Page 10: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Review: fMRIs

(Heeger & Ress, 2002)

(Stat 699) High-Dimensional Data January 10, 2011 9 / 16

Page 11: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Statistical Questions: fMRIs

Inference:I Which voxels are significant?I Which groups of voxels ( regions

of interest) are significant?

Clustering:I Groups of voxels that behave

similarly - finding regions ofinterest.

Networks (FunctionalConnectivity):

I How are voxels or groups ofvoxels related to each other?

I How are voxels or groups ofvoxels related through time?

(Stat 699) High-Dimensional Data January 10, 2011 10 / 16

Page 12: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Others

Finance.I Time Series Data.

Climate Data.I Spatial Data.I Spatio-temporal Data.

Neuroimaging.I DTI - Diffusion Tensor Imaging.I Calcium-Florescence Imaging.I EEG & MEG.

(Stat 699) High-Dimensional Data January 10, 2011 11 / 16

Page 13: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

1 Large p, Small n: Biological Data

2 Random Fields Data

3 Collaborative Filtering Data

(Stat 699) High-Dimensional Data January 10, 2011 12 / 16

Page 14: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Example: Netflix Movie Rating Data

Rows: Movies.

Columns: Customers.

Measurement: Movieratings (scale of 1 - 5).

Anne Ben Charlie Doug Eve . . .

Star Wars 2 5 4 4 3 . . .Harry Potter 3 4 5 3 ? . . .

Pretty Woman 4 ? 2 ? 5 . . .Titanic 5 ? 2 1 3 . . .

Lord of the Rings ? 5 5 4 4 . . ....

......

......

.... . .

(Stat 699) High-Dimensional Data January 10, 2011 13 / 16

Page 15: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Netflix Prize

Challenge: Predict un-rated movies with 10% improvement overCinematch.

Training Set: 480,000 customer ratings on 18,000 movies.

Around 98.7% missing ratings!

$1,000,000 prize!

Contest: October 2006 - August 2009.

Winners: Team led by Robert Bell and Yehuda Koren.

Methods: Variations on the SVD and k-nearest neighbors (Bell &Koren, 2008).

Fields: Recommender systems & Collaborative filtering.

(Stat 699) High-Dimensional Data January 10, 2011 14 / 16

Page 16: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Visualizing Netflix Data

Zipf

(Justin S. Dyer & Art B. Owen, 2010)

(Stat 699) High-Dimensional Data January 10, 2011 15 / 16

Page 17: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Visualizing Netflix Data

Copulas

(Justin S. Dyer & Art B. Owen, 2010)

(Stat 699) High-Dimensional Data January 10, 2011 15 / 16

Page 18: Examples of High-Dimensional Data - Rice Universitygallen/examples_high_dim_data.pdf · Examples of High-Dimensional Data Genevera I. Allen Statistical Learning: High-Dimensional

Other Examples

Amazon

Facebook

Yahoo!

Twitter

(Stat 699) High-Dimensional Data January 10, 2011 16 / 16