university at buffalothe state university of new york visualization and microarray complement to...

39
University at Buffalo The State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects the structure of dataset Early / late stage of data mining Challenges of Microarray Visualization High dimensionality Large data size Intuitive layout Low time complexity

Upload: michael-lambert

Post on 19-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Visualization and Microarray

• Complement to numerical analysis

• Offers insightful information

• Detects the structure of dataset

• Early / late stage of data mining• Challenges of Microarray Visualization

– High dimensionality– Large data size– Intuitive layout– Low time complexity

Page 2: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

An Example – Early Stage

Page 3: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

General Approaches• Global Visualizations

– Encode each dimension uniformly by the same visual cue

Parallel coordinates

Page 4: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

General Approaches, con’t• Optimal Visualizations

– Estimate the parameters and assess the fit of various spatial distance models for proximity data

– Multidimensional scaling (MDS)• Sammon’s mapping: topology preservation. Two samples that

are close to each other have to stay close when projected.

Page 5: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Sammon’s mapping

• Sammon’s mapping is a classical case of MDS• MDS optimizes 2-D presentation to preserve

distances in original N-dimensional space

• Sammon’s mapping iteratively minimizes

i ij ij

i ijij d

ddd

ijijE *

2

*

)( *1

dij* is the distance between points i and j in the N-dimensional spacedij* is the distance between points I and j in the visualization.

Page 6: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

2D to 1D

Page 7: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

A method for achieving this projection 1. D1, D2 and D3 (the interpoint distances in the higher

dimensional space) are calculated. 2. P1', P2' and P3' are generated randomly in the lower

dimensional space. 3. The mapping error, E, is calculated for all the

interpoint distances in the lower dimensional space.4. The gradient showing the direction which minimizes

the error is calculated. 5. The points in the lower dimensional space are moved

according to the direction given by the gradient. 6. Steps 3 to 5 are repeated until E is below a given

limit.

Page 8: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Sammon’s mapping, con’t

• Some drawbacks – Computationally intensive, time complexity O(n2) – How to determine the best initialization– No user interaction is permitted– Addition of new data points requires rerun the process to get

new minimized projection– Information loss

Page 9: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

General Approaches, con’t• Projective Visualizations

– Use projection functions to achieve a low dimensional display

– Radial Visualizations• RadViz• Star Coordinates• VizStruct

Page 10: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Comparison of ApproachesAdvantages Disadvantages

Global visualization Display all dimensional information, no computation

Severe overlapping, large space to display

Optimal visualization

Achieve optimal result, sound theoretical basis

Lack user interaction, heavy computation

Projection visualization

Concise display, little computation

Lack regorous proof, may not be optimal

Page 11: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Challenges of Microarray Visualization

• High dimensionality• Large data size• Intuitive layout• Low time complexity

Page 12: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Density or Heat Plots

Ge

nes

0

1

Sample

Increased

Before IFN After IFN

• Widely used with arrays

• Works well only for structured data

• Quantitative information is lost

• Gets easily cluttered

Page 13: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

TreeView Visualization

Page 14: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Principal component analysisPCA: • linear projection of data onto major principal components defined by the eigenvectors of the covariance matrix.• PCA is also used for reducing the dimensionality of the data.• Criterion to be minimised: square of the distance between the original and projected data. This is fulfilled by the Karhuven-Loeve transformation

Px Px

1( )( )

1

ti i

i

x xn

C

P is composed by eigenvectors of the covariance matrix

Example: Leukemia data sets by Golub et al.: Classification of ALL and AML

Page 15: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Sammon`s mapping:• Non-linear multi-dimensional scaling such as Sammon's mapping aim to optimally conserve the distances in an higher dimensional space in the 2/3-dimensional space.• Mathematically: Minimalisation of error function E by steepest descent method:

Multi-linear scaling

Example: DLBCL prognosis – cured vs featal cases

2( )1

Nij ij

Ni j ijiji j

D dE

DD

Page 16: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Our Visualization Approach

Gene Space

Sample Space

Fourier Harmonic Projection

Page 17: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Geometric Interpretation

N-dimensional space Two-dimensional space

Page 18: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

An Example of the Mapping

P=[a,a,…a] -> ?

Page 19: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

First Fourier Harmonic Projection

N-dimensional space Two-dimensional space

Page 20: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Analytical Properties

Page 21: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Scaling and Transpose Property

Original

Shift

Scaling

Transpose

Page 22: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Time Shifting Property

Page 23: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Visual Exploration Framework

• Explorative Visualization – Sample space

• Confirmative Visualization – Gene space

Page 24: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

VizStruct Architecture

WebBrowser WebBrowser

Internet

Client

ClientClient

Web Server

MatlabWeb Server

MatlabLibraries

Intranet

MatlabApplications

Page 25: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

VizStruct User Interface

Page 26: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

VizStruct User Interface (3)

Cartesian Plot Polar plot

Page 27: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

VizStruct User Interface (2)

EM Mixture Density contour

Page 28: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Sample Classification

Page 29: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Binary Classification

Leukemia-A

72 samples with 7129 genes 38(27+11)Training,34(20+14) Testing, hold out evaluation

Multiple Sclerosis

44 samples, 4132 genes MS_IFN(28), MS_CON(30), cross validation evaluation

Binary classification: two sample classes

Evaluation: hold out and cross validation

Page 30: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Multiple Classification

Breast Cancer

22 samples with 3226 genes 3 Classes: BRCA1 (7), BRCA2 (8), Sporadic (7) cross validation evaluation

88 samples with 2308 genes 4 classes: RMS, BL, NB, EWS, 63 Training and 25 Testing

SRBCT

Page 31: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Classification Summary

Page 32: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Temporal Pattern (1)

10-OH NortryptylineNortryptyline

Page 33: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Temporal Pattern (2)

• Rat Kidney data set of Stuart et al. (2001) contains 873 genes of 7 time points during kidney development

• There are 5 patterns or gene groups classified by the author

• Parallel coordinate shows the actual data comply to the profiles but with some noise

Parallel coordinates for each of the gene groups

Idealized temporal gene expression profiles

Page 34: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Temporal Pattern (3)

Genes having very high relative levels of expression in early development

Genes having arelatively steady increase in expression throughout development

The first Fourier harmonic projection

Genes are somewhat symmetric to the middle time point, i.e., they are transposing each other

Genes are very similar except the last time point

Page 35: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

VizStruct vs. Sammon’s Mapping

-0.2

-0.1

0

0.1

0.2

0.1 0.12 0.14 0.16 0.18 0.2

12

34

5 67 89 10

1112 13

14

151617

18 1920

2122

23

2425 2627 2829

30 3132

3334

3536 37

3839 4041

42

43 44 4546

4748

4950

51525354 5556

5758

5960

6162

6364

65

6667 68

69

707172

7374

75 76

777879

8081 82 83

84

8586

8788

8990

91 929394

959697 98

99

100

101102

103104105

106

107

108109

110

111

112 113114115 116117

118

119

120121

122

123

124125126

127128

129 130131

132133

134

135 136

137 138

139140141

142143

144145 146

147148149 150

Ima

gin

ary

Pa

rt o

f F

1(x[n

])

Real Part of F1(x[n])

VizStruct

-4

-2

0

2

4

-2 0 2

123

45

67

89 1011

1213

1415

161718

1920

2122

23

242526 27

28293031 32

33 34

3536 37

3839 40

41

42

43

44 4546

4748 4950

5152

53

54

5556

57

58

59

6061

6263

64

65

6667

68

69

70

71

72

73

74

75 76

7778

79

80

8182 83

84

8586

8788

8990

91 92

93

94

959697 98

99

100

101

102

103

104105

106

107

108

109110

111112

113114

115116

117

118

119

120

121

122

123

124

125

126

127128

129

130

131132

133

134

135

136

137138

139

140141

142143

144145

146147 148 149150

Ima

gin

ary

Pa

rt o

f F

1(x

[n])

Real Part of F1(x[n])

Sammon's Mapping

• VizStruct is similar to Sammon’s mapping

Page 36: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

VizStruct - Dimension Tour

Interactively adjust dimension parameters

Manually or automatically

May cause false clusters to break

Create dynamic visualization

Page 37: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Visualized Results for a Time Series Data Set

Page 38: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

Interrelated Dimensional Clustering

The approach is applied on classifying multiple-sclerosis patients and IFN-drug treated patients.

– (A) Shows the original 28 samples' distribution. Each point represents a sample, which is a mapping from the sample's 4132 genes intensity vectors.

– (B) Shows 28 samples' distribution on 2015 genes.– (C) Shows 28 samples' distribution on 312 genes. – (D) Shows the same 28 samples distribution after using our approach. We reduce

4132 genes to 96 genes.

Page 39: University at BuffaloThe State University of New York Visualization and Microarray Complement to numerical analysis Offers insightful information Detects

University at Buffalo The State University of New York

References• Li Zhang, Aidong Zhang, and Murali Ramanathan VizStruct: Exploratory

Visualization for Gene Expression Profiling. Bioinformatics 2004 20: 85-92, 2004.• Li Zhang, Chun Tang, Yuqing Song, and Aidong Zhang, Murali Ramanathan.

VizCluster and Its Application on Clustering Gene Expression Data. International Journal of Distributed and Parallel Database, 13(1): 73-97, 2003

• Li Zhang, Aidong Zhang, and Murali Ramanathan: Enhanced Visualization of Time Series through Higher Fourier Harmonics. In proceeding of BIOKDD 2003, Washington DC, August 2003, pp 49-56.

• Li Zhang, Aidong Zhang, and Murali Ramanathan: Fourier Harmonic Approach for Visualizing Temporal Patterns of Gene Expression Data. In proceeding of IEEE Computer Society Bioinformatics Conference (CSB 2003). Stanford, CA, August 2003, pp131-141.

• Li Zhang, Aidong Zhang, and Murali Ramanathan. Visualized Classification of Multiple Sample Types. In proceeding of BIOKDD 2002, Edmonton, Alberta, Canada, July 2002, pp 55-62.

• Li Zhang, Chun Tang, Yong Shi, Yuqing Song, and Aidong Zhang, Murali Ramanathan. VizCluster: An Interactive Visualization Approach to Cluster Analysis and Its Application on Microarray Data. In proceeding of the Second SIAM International Conference on Data Mining (SDM02). Arlinton, VA. April 2002, pp 29-51.