data integration: what i haven't yet achieved
DESCRIPTION
Data integration is a hot topic in bioinformatics, but the term means different things to different people. What do we think it means? Talk given at CSIRO Bioinformatics & Biostatistics group meeting, November 21 2012.TRANSCRIPT
Data Integration: what I haven’t yet achieved
Neil Saunders
MATHEMATICS, INFORMATICS AND STATISTICSwww.csiro.au
Data integration 2 of 21
My main project
Ludwig colorectal cancer study
Data integration 3 of 21
Multiple “omics” platforms
exon expression methylation copy number
Data integration 4 of 21
We want to “integrate” these data
but what does that mean?
Data integration 5 of 21
Integration can mean “portals”
Data integration 6 of 21
Integration can mean “visualization”
Data integration 7 of 21
Integration can mean “correlation”
Data integration 8 of 21
What do we think integration means?
A B C+ +
More information when combined than when separate
Data integration 9 of 21
What’s already “out there”? PubMed
●
●
●
●
●
●
●
●
●
●
●
4
8
12
2002 2004 2006 2008 2010Year
artic
les
/ 100
000
PubMed Search: "data integration"
Data integration 10 of 21
What’s already “out there”? CiteULike
http://www.citeulike.org/user/neils/tag/integration
Data integration 11 of 21
Buzz-word compliant
Data integration 12 of 21
Quote from integIRTy paper
These methods can be roughly grouped into four categories:stepwise, regression-based, correlation-based andlatent variable modelsintegIRTy: a method to identify genes altered in cancer by accounting formultiple mechanisms of regulation using item response theoryBioinformatics, Vol. 28, No. 22. (15 November 2012), pp. 2861-2869
Data integration 13 of 21
Regression: SIM
Integrated analysis of DNA copy number and gene expression microarray data using gene setsBMC Bioinformatics 2009, 10:203
Data integration 14 of 21
Correlation: DR-Integrator
0 2 4
0 0.2 0.4 0.6 0.8 1
Cor
rela
tion
Chr
22212019
1817
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
010
026
142
011
115
018
037
145
017
009
023
002
116
117
120
003
036
029
040
114
118
121
112
006
113
119
034
035
028
004
007
013
014
016
024
012
019
021
015
001
067
068
072
077
048
058
064
050
075
080
086
051
061
070
076
087
092
096
099
101
104
110
093
097
100
089
109
091
103
127
130
131
135
133
136
134
137
125
128
138
146
032
033
043
038
041
042
140
141
144
153
152
147
122
123
132
126
139
069
074
085
055
095
005
066
010
026
142
011
115
018
037
145
017
009
023
002
116
117
120
003
036
029
040
114
118
121
112
006
113
119
034
035
028
004
007
013
014
016
024
012
019
021
015
001
067
068
072
077
048
058
064
050
075
080
086
051
061
070
076
087
092
096
099
101
104
110
093
097
100
089
109
091
103
127
130
131
135
133
136
134
137
125
128
138
146
032
033
043
038
041
042
140
141
144
153
152
147
122
123
132
126
139
069
074
085
055
095
005
066
Data integration 15 of 21
Latent variable: iCluster
(file under impractical)
Data integration 16 of 21
Basics that are never explained 1/2
Integration across groups or description of samples?
Data integration 17 of 21
Basics that are never explained 2/2
Genes x Samples
Data integration 18 of 21
Conclusions 1/3
We’re not the first people doing this......but it’s becoming a “hot topic”
Data integration 19 of 21
Conclusions 2/3
Room for improvement in software, much of which is:
• Poorly-written
• Poorly-documented
• Difficult to implement
Data integration 20 of 21
Conclusions 3/3
Too much for one individual!
MATHEMATICS, INFORMATICS AND STATISTICSwww.csiro.au
CSIRO Mathematics, Informatics and Statistics
Neil Saunderst +61 2 9325 3144e [email protected] Mathematics, Informatics and Statistics web