data integration: what i haven't yet achieved

21
Data Integration: what I haven’t yet achieved Neil Saunders MATHEMATICS, INFORMATICS AND STATISTICS www.csiro.au

Upload: neil-saunders

Post on 24-Jan-2015

220 views

Category:

Technology


0 download

DESCRIPTION

Data integration is a hot topic in bioinformatics, but the term means different things to different people. What do we think it means? Talk given at CSIRO Bioinformatics & Biostatistics group meeting, November 21 2012.

TRANSCRIPT

Page 1: Data Integration: What I Haven't Yet Achieved

Data Integration: what I haven’t yet achieved

Neil Saunders

MATHEMATICS, INFORMATICS AND STATISTICSwww.csiro.au

Page 2: Data Integration: What I Haven't Yet Achieved

Data integration 2 of 21

My main project

Ludwig colorectal cancer study

Page 3: Data Integration: What I Haven't Yet Achieved

Data integration 3 of 21

Multiple “omics” platforms

exon expression methylation copy number

Page 4: Data Integration: What I Haven't Yet Achieved

Data integration 4 of 21

We want to “integrate” these data

but what does that mean?

Page 5: Data Integration: What I Haven't Yet Achieved

Data integration 5 of 21

Integration can mean “portals”

Page 6: Data Integration: What I Haven't Yet Achieved

Data integration 6 of 21

Integration can mean “visualization”

Page 7: Data Integration: What I Haven't Yet Achieved

Data integration 7 of 21

Integration can mean “correlation”

Page 8: Data Integration: What I Haven't Yet Achieved

Data integration 8 of 21

What do we think integration means?

A B C+ +

More information when combined than when separate

Page 9: Data Integration: What I Haven't Yet Achieved

Data integration 9 of 21

What’s already “out there”? PubMed

4

8

12

2002 2004 2006 2008 2010Year

artic

les

/ 100

000

PubMed Search: "data integration"

Page 10: Data Integration: What I Haven't Yet Achieved

Data integration 10 of 21

What’s already “out there”? CiteULike

http://www.citeulike.org/user/neils/tag/integration

Page 11: Data Integration: What I Haven't Yet Achieved

Data integration 11 of 21

Buzz-word compliant

Page 12: Data Integration: What I Haven't Yet Achieved

Data integration 12 of 21

Quote from integIRTy paper

These methods can be roughly grouped into four categories:stepwise, regression-based, correlation-based andlatent variable modelsintegIRTy: a method to identify genes altered in cancer by accounting formultiple mechanisms of regulation using item response theoryBioinformatics, Vol. 28, No. 22. (15 November 2012), pp. 2861-2869

Page 13: Data Integration: What I Haven't Yet Achieved

Data integration 13 of 21

Regression: SIM

Integrated analysis of DNA copy number and gene expression microarray data using gene setsBMC Bioinformatics 2009, 10:203

Page 14: Data Integration: What I Haven't Yet Achieved

Data integration 14 of 21

Correlation: DR-Integrator

0 2 4

0 0.2 0.4 0.6 0.8 1

Cor

rela

tion

Chr

22212019

1817

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

010

026

142

011

115

018

037

145

017

009

023

002

116

117

120

003

036

029

040

114

118

121

112

006

113

119

034

035

028

004

007

013

014

016

024

012

019

021

015

001

067

068

072

077

048

058

064

050

075

080

086

051

061

070

076

087

092

096

099

101

104

110

093

097

100

089

109

091

103

127

130

131

135

133

136

134

137

125

128

138

146

032

033

043

038

041

042

140

141

144

153

152

147

122

123

132

126

139

069

074

085

055

095

005

066

010

026

142

011

115

018

037

145

017

009

023

002

116

117

120

003

036

029

040

114

118

121

112

006

113

119

034

035

028

004

007

013

014

016

024

012

019

021

015

001

067

068

072

077

048

058

064

050

075

080

086

051

061

070

076

087

092

096

099

101

104

110

093

097

100

089

109

091

103

127

130

131

135

133

136

134

137

125

128

138

146

032

033

043

038

041

042

140

141

144

153

152

147

122

123

132

126

139

069

074

085

055

095

005

066

Page 15: Data Integration: What I Haven't Yet Achieved

Data integration 15 of 21

Latent variable: iCluster

(file under impractical)

Page 16: Data Integration: What I Haven't Yet Achieved

Data integration 16 of 21

Basics that are never explained 1/2

Integration across groups or description of samples?

Page 17: Data Integration: What I Haven't Yet Achieved

Data integration 17 of 21

Basics that are never explained 2/2

Genes x Samples

Page 18: Data Integration: What I Haven't Yet Achieved

Data integration 18 of 21

Conclusions 1/3

We’re not the first people doing this......but it’s becoming a “hot topic”

Page 19: Data Integration: What I Haven't Yet Achieved

Data integration 19 of 21

Conclusions 2/3

Room for improvement in software, much of which is:

• Poorly-written

• Poorly-documented

• Difficult to implement

Page 20: Data Integration: What I Haven't Yet Achieved

Data integration 20 of 21

Conclusions 3/3

Too much for one individual!

Page 21: Data Integration: What I Haven't Yet Achieved

MATHEMATICS, INFORMATICS AND STATISTICSwww.csiro.au

CSIRO Mathematics, Informatics and Statistics

Neil Saunderst +61 2 9325 3144e [email protected] Mathematics, Informatics and Statistics web